Performing multi-convolution operations in a parallel processing system

US10223333B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10223333-B2
Application numberUS-201514838291-A
CountryUS
Kind codeB2
Filing dateAug 27, 2015
Priority dateAug 29, 2014
Publication dateMar 5, 2019
Grant dateMar 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment of the present invention a convolution engine configures a parallel processing pipeline to perform multi-convolution operations. More specifically, the convolution engine configures the parallel processing pipeline to independently generate and process individual image tiles. In operation, for each image tile, the pipeline calculates source locations included in an input image batch. Notably, the source locations reflect the contribution of the image tile to an output tile of an output matrix—the result of the multi-convolution operation. Subsequently, the pipeline copies data from the source locations to the image tile. Similarly, the pipeline copies data from a filter stack to a filter tile. The pipeline then performs matrix multiplication operations between the image tile and the filter tile to generate data included in the corresponding output tile. To optimize both on-chip memory usage and execution time, the pipeline creates each image tile in on-chip memory as-needed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for performing a multi-convolution operation, the method comprising: calculating a first source location included in an image batch that is stored in a first memory based on a first destination location included in a first image tile that is stored in a second memory, wherein the first image tile comprises a subset of the image batch, wherein calculating the first source location comprises associating the first destination location with a first virtual location included in a virtual image matrix and performing one or more indexing operations that map the first virtual location to the first source location; copying data from the first source location to the first destination location; copying data from a filter source location included in a filter stack that is stored in the first memory to a filter destination location included in a first filter tile that is stored in the second memory; and performing one or more matrix multiplication operations between the first image tile and the first filter tile to generate a first output tile associated with an output matrix that is stored in the second memory. 2. The computer-implemented method of claim 1 , wherein the first memory comprises off-chip memory and the second memory comprises on-chip memory. 3. The computer-implemented method of claim 1 , wherein associating the first destination location comprises performing one or more arithmetic calculations based on at least one of a size of the second memory and the number of threads included in a thread group. 4. The computer-implemented method of claim 1 , further comprising, assigning the first image tile to a first thread group, and configuring a first thread included in the first thread group to calculate the first source location based on the first destination location. 5. The computer-implemented method of claim 4 , further comprising, assigning a second image tile to a second thread group, and configuring a second thread included in the second thread group to calculate a second source location included in the image batch based on a second destination location included in the second image tile. 6. The computer-implemented method of claim 1 , further comprising performing one or more output formatting operations based on the output matrix to generate an output batch, and storing the output batch in the first memory. 7. The computer-implemented method of claim 6 , wherein the image batch comprises a first layer included in a convolutional neural network, and the output batch comprises a second layer included in the convolutional neural network. 8. The computer-implemented method of claim 1 , wherein the image batch is partitioned into a plurality of image tiles, each image tile comprising a subset of the image batch. 9. The computer-implemented method of claim 1 , wherein: the image batch is partitioned into a plurality of image tiles; and for the plurality of image tiles, the second memory stores only a current image tile that is currently being processed for matrix multiplication operations. 10. The computer-implemented method of claim 9 , wherein: for the plurality of image tiles, the second memory does not store any previous image tiles that are processed for matrix multiplication operations previous to the current image tile; and for the plurality of image tiles, the second memory does not store any subsequent image tiles that are processed for matrix multiplication operations after the current image tile. 11. The computer-implemented method of claim 1 , further comprising: calculating a second source location included in the image batch based on a second destination location included in a second image tile that is stored in a second memory, wherein the second image tile comprises a subset of the image batch, wherein the first image tile is discarded from the second memory prior to calculating the second source location. 12. A non-transitory, computer-readable storage medium including instructions that, when executed by a processor, cause the processor to perform a multi-convolution operation, by performing the steps of: calculating a first source location included in an image batch that is stored in a first memory based on a first destination location included in a first image tile that is stored in a second memory, wherein the first image tile comprises a subset of the image batch, wherein calculating the first source location comprises associating the first destination location with a first virtual location included in a virtual image matrix and performing one or more indexing operations that map the first virtual location to the first source location; copying data from the first source location to the first destination location; copying data from a filter source location included in a filter stack that is stored in the first memory to a filter destination location included in a first filter tile that is stored in the second memory; and performing one or more matrix multiplication operations between the first image tile and the first filter tile to generate a first output tile associated with an output matrix that is stored in the second memory. 13. The non-transitory computer-readable storage medium of claim 12 , wherein the first memory comprises off-chip memory and the second memory comprises on-chip memory. 14. The non-transitory computer-readable storage medium of claim 12 , further comprising, assigning the first image tile to a first thread group, and configuring a first thread included in the first thread group to calculate the first source location based on the first destination location. 15. The non-transitory computer-readable storage medium of claim 14 , further comprising, assigning a second image tile to a second thread group, and configuring a second thread included in the second thread group to calculate a second source location included in the image batch based on a second destination location included in the second image tile. 16. The non-transitory computer-readable storage medium of claim 12 , wherein a plurality of dimensions of the image batch comprise a batch size, a total number of color planes, an image width, and an image height. 17. The non-transitory computer-readable storage medium of claim 12 , wherein a plurality of dimensions of the filter stack comprise a total number of filter sets, a total number of feature planes, a filter width, and a filter height. 18. A system configured to perform a multi-convolution operation, the system comprising: a first memory; a second memory; and a convolution engine coupled to both the first memory and the second memory, and configured to: calculate a first source location included in an image batch that is stored in the first memory based on a first destination location included in a first image tile that is stored in the second memory, wherein the first image tile comprises a subset of the image batch, wherein calculating the first source location comprises associating the first destination location with a first virtual location included in a virtual image matrix and performing one or more indexing operations that map the first virtual location to the first source location, copy data from the first source location to the first destination location, copy data from a filter source location included in a filter stack that is stored in the first memory to a filter destination location included in a first filter tile that is stored in the second memory, and perform one or more matrix multiplication operations between the f

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06F17/153Primary

    Multidimensional correlation or convolution · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10223333B2 cover?
In one embodiment of the present invention a convolution engine configures a parallel processing pipeline to perform multi-convolution operations. More specifically, the convolution engine configures the parallel processing pipeline to independently generate and process individual image tiles. In operation, for each image tile, the pipeline calculates source locations included in an input image…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F17/153. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).