Apparatus and method for efficient prefix sum operation

US9632979B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9632979-B2
Application numberUS-201514727826-A
CountryUS
Kind codeB2
Filing dateJun 1, 2015
Priority dateJun 1, 2015
Publication dateApr 25, 2017
Grant dateApr 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: providing a plurality of data elements to a graphics processor unit (GPU) for input to a prefix sum operation; storing the plurality of data elements in specified data element positions of a first register of the GPU; performing a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum; wherein the intermediate results are to be stored in data element positions within the first register; wherein at least each intermediate result is to be stored in a data element position previously occupied by one of the data elements used to generate the intermediate result; wherein the simultaneous addition operations comprise SIMD8 copy with regioning and SIMD4 copy with regioning operations; and wherein the SIMD4 copy with regioning and SIMD8 copy with regioning operations comprise one or more instructions which specify one or more stride values to identify data element positions to be used for the operations. 2. The method as in claim 1 wherein the one or more stride values comprise a vertical stride value and a horizontal stride value. 3. The method as in claim 2 further comprising: initializing the first register to include zeroes in all data element positions within the first register prior to storing the plurality of data elements in specified data element positions of the first register. 4. The method as in claim 3 wherein initializing the first register comprises performing a first move operation to move zeroes to all of the data element positions, the first move operation to ignore an execution mask value. 5. The method as in claim 4 wherein storing the plurality of data elements in specified data element positions of the first register comprises performing a second move operation, the second move operation moving data element values from a second register or from memory into the first register, the second move operation to honor any execution mask value, thereby superimposing the data element values over the zeroes already stored in the first register. 6. The method as in claim 1 wherein the data elements are each 32 bits and wherein first register comprises a 512 bit packed data register storing 16 data elements. 7. The method as in claim 6 wherein the prefix sum operation comprises a sum of a set of sequential numbers, wherein each number in the set comprises one of the data elements. 8. An apparatus comprising: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum; wherein the intermediate results are to be stored in data element positions within the first register; wherein at least each intermediate result is to be stored in a data element position previously occupied by one of the data elements used to generate the intermediate result; wherein the simultaneous addition operations comprise SIMD8 copy with regioning and SIMD4 copy with regioning operations; wherein the SIMD4 copy with regioning and SIMD8 copy with regioning operations comprise one or more instructions which specify one or more stride values to identify data element positions to be used for the operations; wherein the SIMD4 copy with regioning and SIMD8 copy with regioning operations comprise one or more instructions which specify one or more stride values to identify data element positions to be used for the operations; and wherein the SIMD4 copy with regioning and SIMD8 copy with regioning operations comprise one or more instructions which specify one or more stride values to identify data element positions to be used for the operations. 9. The apparatus as in claim 8 wherein the one or more stride values comprise a vertical stride value and a horizontal stride value. 10. The apparatus as in claim 9 wherein to perform the prefix sum operation, the execution units initialize the first register to include zeroes in all data element positions within the first register prior to storing the plurality of data elements in specified data element positions of the first register. 11. The apparatus as in claim 10 wherein initializing the first register comprises performing a first move operation to move zeroes to all of the data element positions, the first move operation to ignore an execution mask value. 12. The apparatus as in claim 11 wherein storing the plurality of data elements in specified data element positions of the first register comprises performing a second move operation, the second move operation moving data element values from a second register or from memory into the first register, the second move operation to honor any execution mask value, thereby superimposing the data element values over the zeroes already stored in the first register. 13. The apparatus as in claim 8 wherein the data elements are each 32 bits and wherein first register comprises a 512 bit packed data register storing 16 data elements. 14. The apparatus as in claim 13 wherein the prefix sum operation comprises a sum of a set of sequential numbers, wherein each number in the set comprises one of the data elements.

Assignees

Inventors

Classifications

  • involving image processing hardware · CPC title

  • Arithmetic instructions · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Special purpose registers · CPC title

  • single instruction multiple data [SIMD] multiprocessors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9632979B2 cover?
An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/8007. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).