Indirectly accessing sample data to perform multi-convolution operations in a parallel processing system

US2016162402A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016162402-A1
Application numberUS-201514951588-A
CountryUS
Kind codeA1
Filing dateNov 25, 2015
Priority dateDec 4, 2014
Publication dateJun 9, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment of the present invention, a convolution engine configures a parallel processing pipeline to perform multi-convolution operations. More specifically, the convolution engine configures the parallel processing pipeline to independently generate and process individual image tiles. In operation, for each image tile, the pipeline calculates source locations included in an input image batch based on one or more start addresses and one or more offsets. Subsequently, the pipeline copies data from the source locations to the image tile. The pipeline then performs matrix multiplication operations between the image tile and a filter tile to generate a contribution of the image tile to an output matrix. To optimize the amount of memory used, the pipeline creates each image tile in shared memory as needed. Further, to optimize the throughput of the matrix multiplication operations, the values of the offsets are precomputed by a convolution preprocessor.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computer-implemented method for performing a multi-convolution operation, the method comprising: selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address; computing a first source address included in an image batch that is stored in a second memory based on the first start address and the first offset; copying data from the first source address to the first destination address; and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile. 2 . The computer-implemented method of claim 1 , wherein the first memory comprises a shared memory, and the second memory comprises a parallel processing memory. 3 . The computer-implemented method of claim 1 , wherein the first filter tile is stored in the first memory, and further comprising: computing a filter source address based on the first destination address; and copying data stored in a filter stack at the filter source address to a filter destination address included in the first filter tile. 4 . The computer-implemented method of claim 1 , wherein selecting the first start address comprises: associating the first destination address with a column of a virtual image matrix; and performing one or more operations that map the column to an address included in the image batch. 5 . The computer-implemented method of claim 1 , wherein identifying the first offset comprises: associating the first destination location with a row of a virtual image matrix; and retrieving a value included in an offset sequence based on the row. 6 . The computer-implemented method of claim 5 , further comprising generating the offset sequence based on a deterministic relationship between the image batch and the virtual image matrix. 7 . The computer-implemented method of claim 1 , further comprising assigning the first image tile to a first thread group, and configuring at least one thread in the thread group to compute the first source address. 8 . The computer-implemented method of claim 7 , further comprising assigning a second image tile to a second thread group, and configuring at least one thread in the second thread group to compute a second source address included in the image batch based on a second start address and the first offset. 9 . The computer-implemented method of claim 8 , wherein the first source address and the second source address are computed substantially in parallel. 10 . A non-transitory, computer-readable storage medium including instructions that, when executed by a processor, cause the processor to perform a multi-convolution operation, by performing the steps of: selecting a first start address based on a first destination address included in a first image tile that is stored in a first memory; identifying a first offset based on the first destination address; computing a first source address included in an image batch that is stored in a second memory based on the first start address and the first offset; copying data from the first source address to the first destination address; and after copying the data, performing one or more matrix multiplication operations between the first image tile and a first filter tile. 11 . The non-transitory computer-readable storage medium of claim 10 , wherein the first memory comprises a shared memory, and the second memory comprises a parallel processing memory. 12 . The non-transitory computer-readable storage medium of claim 10 , wherein the first filter tile is stored in the first memory, and further comprising: computing a filter source address based on the first destination address; and copying data stored in a filter stack at the filter source address to a filter destination address included in the first filter tile. 13 . The non-transitory computer-readable storage medium of claim 10 , wherein selecting the first start address comprises: associating the first destination address with a column of a virtual image matrix; and performing one or more operations that map the column to an address included in the image batch. 14 . The non-transitory computer-readable storage medium of claim 10 , wherein identifying the first offset comprises: associating the first destination location with a row of a virtual image matrix; and retrieving a value included in an offset sequence based on the row. 15 . The non-transitory computer-readable storage medium of claim 14 , further comprising generating the offset sequence based on a deterministic relationship between the image batch and the virtual image matrix. 16 . The non-transitory computer-readable storage medium of claim 10 , further comprising configuring at least one thread in a second thread group to compute a second source address included in the image batch based on the first start address and a second offset. 17 . The non-transitory computer-readable storage medium of claim 10 , further comprising performing one or more output formatting operations based on the output matrix to generate an output batch. 18 . The non-transitory computer-readable storage medium of claim 17 , wherein a first layer included in a convolutional neural network includes at least the image batch, and a second layer included in the convolution neural network includes at least the output batch. 19 . A system configured to perform a multi-convolution operation, the system comprising: a first memory; a second memory; and a convolution engine coupled to both the first memory and the second memory, and configured to: identify a first offset included in an offset sequence based on a first destination address included in a first image tile that is stored in the first memory; compute a first source address included in an image batch that is stored in the second memory based on the first offset; copy data from the first source address to the first destination address; and after copying the data, perform one or more matrix multiplication operations between the first image tile and a first filter tile. 20 . The system of claim 19 , wherein the first memory comprises a shared memory, and the second memory comprises a parallel processing memory.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06V40/172Primary

    Classification, e.g. identification · CPC title

  • relating to colour · CPC title

  • by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis · CPC title

  • G06V10/95Primary

    structured as a network, e.g. client-server architectures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016162402A1 cover?
In one embodiment of the present invention, a convolution engine configures a parallel processing pipeline to perform multi-convolution operations. More specifically, the convolution engine configures the parallel processing pipeline to independently generate and process individual image tiles. In operation, for each image tile, the pipeline calculates source locations included in an input imag…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06V40/172. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 09 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).