Hardware instruction set to replace a plurality of atomic operations with a single atomic operation

US2016139934A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016139934-A1
Application numberUS-201414543027-A
CountryUS
Kind codeA1
Filing dateNov 17, 2014
Priority dateNov 17, 2014
Publication dateMay 19, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.

First claim

Opening claim text (preview).

We claim: 1 . A system comprising: an instruction module to generate an instruction set to replace a plurality of atomic operations with a single atomic operation, the instruction module including: an accumulation module to generate an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values; and a broadcast module to generate a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values; and a graphics processor to execute the instruction set to process the single atomic operation. 2 . The system of claim 1 , wherein the instruction set is to include two or more of a same number of instructions for a uniform source value operation and a non-uniform source value operation, only about 5 instructions to about 10 instructions, and no loops. 3 . The system of claim 1 , wherein the instruction module further includes two or more of: a move module to generate a move instruction to copy an accumulation result value based on the plurality of accumulated values from an accumulation register to a result register; an atomic instruction module to generate an atomic instruction to add the accumulation result value with the pre-existing value to generate an atomic instruction result value that is to replace the pre-existing value in memory; and a subtraction module to generate a subtract instruction to subtract between each of the plurality of initial values and each of the plurality of intermediate accumulated values to generate a plurality of final values associated with the plurality of processing lanes. 4 . The system of claim 3 , wherein the instruction module further includes a partition module to generate a partition instruction to logically partition the plurality of processing lanes into two or more subsets, wherein the accumulation module is to generate a first accumulation instruction for a plurality of first initial values associated with a first subset of the plurality of processing lanes to generate a plurality of first accumulated values and a second accumulation instruction for a plurality of second initial values associated with a second subset of the plurality of processing lanes to generate a plurality of second accumulated values. 5 . The system of claim 4 , wherein the instruction module further includes: a combination module to generate a combination instruction to add a first accumulation result value based on the plurality of first accumulated values with a second accumulation result value based on the plurality of second accumulated values to generate a combined accumulation result value; and a subset value update module to generate an update instruction to add the first accumulation result value with each of the plurality of second accumulated values to generate a plurality of updated accumulated values. 6 . The system of claim 5 , wherein the atomic instruction module is to generate an atomic instruction to add the combined accumulation result value with the pre-existing value to generate the atomic instruction result value that is to replace the pre-existing value in the memory, and wherein the broadcast module is to generate a first broadcast instruction to return the pre-existing value to be added with each of the plurality of first accumulated values to generate a plurality of first intermediate accumulated values and a second broadcast instruction to return the pre-existing value to be added with each of the plurality of updated accumulated values to generate a plurality of second intermediate accumulated values. 7 . The system of claim 6 , wherein the subtraction module is to generate a first subtract instruction to subtract between each of the plurality of first initial values and each of the plurality of first intermediate accumulated values and a second subtract instruction to subtract between each of the plurality of second initial values and each of the plurality of second intermediate accumulated values to generate the plurality of final values associated with the plurality of processing lanes. 8 . The system of claim 1 , further including a compiler to apply the instruction module to generate the instruction set in a graphics hardware machine language, wherein the graphics processor is to include a single instruction multiple data (SIMD) architecture, and wherein the partial prefix sum is to be computed up to an SIMD execution engine length including one or more of eight processing lanes, sixteen processing lanes, and thirty-two processing lanes. 9 . A computer implemented method comprising: generating an instruction set to replace a plurality of atomic operations with a single atomic operation including: generating an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values; and generating a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values; and executing the instruction set to process the single atomic operation. 10 . The computer implemented method of claim 9 , wherein the instruction set includes two or more of a same number of instructions for a uniform source value operation and a non-uniform source value operation, only about 5 instructions to about 10 instructions, and no loops. 11 . The computer implemented method of claim 9 , further including two or more of: generating a move instruction to copy an accumulation result value based on the plurality of accumulated values from an accumulation register to a result register; generating an atomic instruction to add the accumulation result value with the pre-existing value to generate an atomic instruction result value that is to replace the pre-existing value in memory; and generating a subtract instruction to subtract between each of the plurality of initial values and each of the plurality of intermediate accumulated values to generate a plurality of final values associated with the plurality of processing lanes. 12 . The computer implemented method of claim 11 , further including: generating a partition instruction to logically partition the plurality of processing lanes into two or more subsets; generating a first accumulation instruction for a plurality of first initial values associated with a first subset of the plurality of processing lanes to generate a plurality of first accumulated values; and generating a second accumulation instruction for a plurality of second initial values associated with a second subset of the plurality of processing lanes to generate a plurality of second accumulated values. 13 . The computer implemented method of claim 12 , further including: generating a combination instruction to add a first accumulation result value based on the plurality of first accumulated values with a second accumulation result value based on the plurality of second accumulated values to generate a combined accumulation result value; and generating an update instruction to add the first accumulation result value with each of the plurality of second accumulated values to generate a plurality of updated accumulated values. 14 . The computer implemented method of claim 13 , further including: generating an atomic instruction to add the combined accumulation result value with the pre-existing value to generate the atomic instruction resu

Assignees

Inventors

Classifications

  • G06F9/3887Primary

    controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • using a mask · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Runtime instruction translation, e.g. macros · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016139934A1 cover?
Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instructi…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3887. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 19 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).