Data processing array interface having interface tiles with multiple direct memory access circuits
US-12164451-B2 · Dec 10, 2024 · US
US9632979B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9632979-B2 |
| Application number | US-201514727826-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 1, 2015 |
| Priority date | Jun 1, 2015 |
| Publication date | Apr 25, 2017 |
| Grant date | Apr 25, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.
Opening claim text (preview).
What is claimed is: 1. A method comprising: providing a plurality of data elements to a graphics processor unit (GPU) for input to a prefix sum operation; storing the plurality of data elements in specified data element positions of a first register of the GPU; performing a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum; wherein the intermediate results are to be stored in data element positions within the first register; wherein at least each intermediate result is to be stored in a data element position previously occupied by one of the data elements used to generate the intermediate result; wherein the simultaneous addition operations comprise SIMD8 copy with regioning and SIMD4 copy with regioning operations; and wherein the SIMD4 copy with regioning and SIMD8 copy with regioning operations comprise one or more instructions which specify one or more stride values to identify data element positions to be used for the operations. 2. The method as in claim 1 wherein the one or more stride values comprise a vertical stride value and a horizontal stride value. 3. The method as in claim 2 further comprising: initializing the first register to include zeroes in all data element positions within the first register prior to storing the plurality of data elements in specified data element positions of the first register. 4. The method as in claim 3 wherein initializing the first register comprises performing a first move operation to move zeroes to all of the data element positions, the first move operation to ignore an execution mask value. 5. The method as in claim 4 wherein storing the plurality of data elements in specified data element positions of the first register comprises performing a second move operation, the second move operation moving data element values from a second register or from memory into the first register, the second move operation to honor any execution mask value, thereby superimposing the data element values over the zeroes already stored in the first register. 6. The method as in claim 1 wherein the data elements are each 32 bits and wherein first register comprises a 512 bit packed data register storing 16 data elements. 7. The method as in claim 6 wherein the prefix sum operation comprises a sum of a set of sequential numbers, wherein each number in the set comprises one of the data elements. 8. An apparatus comprising: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum; wherein the intermediate results are to be stored in data element positions within the first register; wherein at least each intermediate result is to be stored in a data element position previously occupied by one of the data elements used to generate the intermediate result; wherein the simultaneous addition operations comprise SIMD8 copy with regioning and SIMD4 copy with regioning operations; wherein the SIMD4 copy with regioning and SIMD8 copy with regioning operations comprise one or more instructions which specify one or more stride values to identify data element positions to be used for the operations; wherein the SIMD4 copy with regioning and SIMD8 copy with regioning operations comprise one or more instructions which specify one or more stride values to identify data element positions to be used for the operations; and wherein the SIMD4 copy with regioning and SIMD8 copy with regioning operations comprise one or more instructions which specify one or more stride values to identify data element positions to be used for the operations. 9. The apparatus as in claim 8 wherein the one or more stride values comprise a vertical stride value and a horizontal stride value. 10. The apparatus as in claim 9 wherein to perform the prefix sum operation, the execution units initialize the first register to include zeroes in all data element positions within the first register prior to storing the plurality of data elements in specified data element positions of the first register. 11. The apparatus as in claim 10 wherein initializing the first register comprises performing a first move operation to move zeroes to all of the data element positions, the first move operation to ignore an execution mask value. 12. The apparatus as in claim 11 wherein storing the plurality of data elements in specified data element positions of the first register comprises performing a second move operation, the second move operation moving data element values from a second register or from memory into the first register, the second move operation to honor any execution mask value, thereby superimposing the data element values over the zeroes already stored in the first register. 13. The apparatus as in claim 8 wherein the data elements are each 32 bits and wherein first register comprises a 512 bit packed data register storing 16 data elements. 14. The apparatus as in claim 13 wherein the prefix sum operation comprises a sum of a set of sequential numbers, wherein each number in the set comprises one of the data elements.
involving image processing hardware · CPC title
Arithmetic instructions · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Special purpose registers · CPC title
single instruction multiple data [SIMD] multiprocessors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.