Galois field multiply reduction and parallel hash
US-2024028341-A1 · Jan 25, 2024 · US
US9952876B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9952876-B2 |
| Application number | US-201414468904-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 26, 2014 |
| Priority date | Aug 26, 2014 |
| Publication date | Apr 24, 2018 |
| Grant date | Apr 24, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.
Opening claim text (preview).
What is claimed is: 1. A method for selecting an active data stream while running a SPMD (Single Program Multiple Data) program of instructions on a SIMD (Single Instruction Multiple Data) machine, an instruction stream having one thread-PC (Program Counter), a value of the thread-PC indicating an instruction memory address which stores an instruction to be fetched next for the instruction stream, comprising: running the instruction stream over a plurality of input data streams each of which corresponds to each of a plurality of lanes, each of the plurality of lanes being associated with a corresponding lane depth counter indicating a number of nested branches for each corresponding lane, a corresponding lane-PC of a lane of the plurality of lanes indicating a memory address which stores the instruction to be fetched next for the lane when the lane is activated, and a lane activation bit indicating whether the lane is active or not; incrementing values of the lane depth counters of all active lanes of the plurality of lanes upon the thread-PC value reaching a branch operation in the instruction stream; updating the lane-PC value of each of the active lanes according to targets of the branch operation; and selecting at least one active lane from the plurality of lanes and assigning a corresponding lane-PC value of the selected at least one active lane to the thread-PC, and activating only lanes of the plurality of lanes whose lane-PC values are equal to the thread-PC value; decrementing the lane depth counter value of the selected at least one active lane and updating the lane-PC value of each of the plurality of active lanes upon the instruction stream reaching a first instruction; and assigning the lane-PC value of a lane of the plurality of lanes with a largest lane depth counter value to the thread-PC and activating all lanes of the plurality of lanes whose lane-PC values are equal to the thread-PC value, wherein a plurality of processors coupled to one or more memory devices perform the running, the incrementing, the assigning, the activating and the decrementing until the thread-PC reaches an end of the instruction stream and the lane-PC values of all of the plurality of lanes are equal to with the thread-PC value. 2. The method according to claim 1 , wherein the branch operation comprises: a function call operation, an if and else logic operation, and an iteration operation encountered by running the instruction stream. 3. The method according to claim 2 , further comprising: incrementing the values of the lane depth counters of all the active lanes of the plurality of lanes when the instruction stream reaches a second instruction. 4. The method according to claim 1 , further comprising: initializing the values the of lane depth counters of all of the plurality of lanes to zero; initializing the values of the lane-PCs of all of the plurality of lanes to an instruction memory address; and activating one or more lanes of the plurality of lanes whose lane-PC values are set to instruction memory address. 5. The method according to claim 1 , further comprising: after activating one or more lanes of the plurality of lanes, assigning a minimum value of values of lane depth counters of all another values of all another active lanes of plurality of lanes to the values of the lane depth counters of all the another active lanes. 6. The method according to claim 3 , further comprising: generating and inserting, by a compiler, the first instruction and the second instruction. 7. The method according to claim 3 , wherein the incrementing comprises: splitting the branch operation into two second instructions and directing a backward branch to one of the two second instructions. 8. The method according to claim 6 , further comprising: generating the first instruction at an exit point of the branch operation by the compiler. 9. The method according to claim 7 , wherein another of the two second instructions specifies a positive increment value. 10. The method according to claim 7 , wherein the one of the two second instructions specifies a zero increment value. 11. A system for selecting an active data stream while running a SPMD (Single Program Multiple Data) program of instructions on a SIMD (Single Instruction Multiple Data) machine, an instruction stream having one thread-PC (Program Counter), a value of the thread-PC indicating an instruction memory address which stores an instruction to be fetched next for the instruction stream, the system comprising: a plurality of processors; a memory device coupled to the plurality of processors, wherein the plurality of processors are configured to perform: running the instruction stream over a plurality of input data streams each of which corresponds to each of a plurality of lanes, each of the plurality of lanes being associated with a corresponding lane depth counter indicating a number of nested branches for each corresponding lane, a corresponding lane-PC of a lane of the plurality of lanes indicating a memory address which stores the instruction to be fetched next for the lane when the lane is activated, and a lane activation bit indicating whether the lane is active or not; incrementing values of the lane depth counters of all active lanes of the plurality of lanes upon the thread-PC value reaching a branch operation in the instruction stream; updating the lane-PC value of each of the active lanes according to targets of the branch operation; and selecting at least one active lane from of the plurality of lanes and assigning a corresponding lane-PC value of the selected at least one active lane to the thread-PC, and activating only lanes of the plurality of lanes whose lane-PC values are equal to the thread-PC value; decrementing the lane depth counter values of the selected at least one active lane and updating the lane-PC value of each of the plurality of active lanes upon the instruction stream reaching a first instruction; and assigning the lane-PC value of a lane of the plurality of lanes with a largest lane depth counter value to the thread-PC and activating all lanes of the plurality of lanes whose lane-PC values are equal to the thread-PC value, wherein a plurality of processors coupled to one or more memory devices perform the running, the incrementing, the assigning, the activating and the decrementing until the thread-PC reaches an end of the instruction stream and the lane-PC values of all of the plurality of lanes are equal to with the thread-PC value. 12. The system according to claim 11 , wherein the branch operation comprises: a function call operation, an if and else logic operation, and an iteration operation encountered by running the instruction streams. 13. The system according to claim 12 , wherein the plurality of processors are configured to perform: incrementing the values of the lane depth counters of all of the active lanes of the plurality of lanes when the instruction steam reaches a second instruction. 14. The system according to claim 11 , wherein in order to perform the running, the plurality of processors are configured to perform: initializing the values of the lane depth counters of all of the plurality of lanes to zero; initializing the values of the lane-PCs of all of the plurality of lanes to an instruction memory address; and activating one or more lanes of the plurality of lanes whose lane-PC values are set to the instruction memory address. 15. The system according to claim 11 , the plurality of processors are further configured to perform: after activating one or more lanes of the plurality o
Two dimensional arrays, e.g. mesh, torus · CPC title
using a plurality of independent parallel functional units · CPC title
Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title
single instruction multiple data [SIMD] multiprocessors · CPC title
comprising an array of processing units with common control, e.g. single instruction multiple data processors (G06F15/82 takes precedence {; for correlation function computation G06F17/15}) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.