Optimize control-flow convergence on SIMD engine using divergence depth

US9952876B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9952876-B2
Application numberUS-201414468904-A
CountryUS
Kind codeB2
Filing dateAug 26, 2014
Priority dateAug 26, 2014
Publication dateApr 24, 2018
Grant dateApr 24, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for selecting an active data stream while running a SPMD (Single Program Multiple Data) program of instructions on a SIMD (Single Instruction Multiple Data) machine, an instruction stream having one thread-PC (Program Counter), a value of the thread-PC indicating an instruction memory address which stores an instruction to be fetched next for the instruction stream, comprising: running the instruction stream over a plurality of input data streams each of which corresponds to each of a plurality of lanes, each of the plurality of lanes being associated with a corresponding lane depth counter indicating a number of nested branches for each corresponding lane, a corresponding lane-PC of a lane of the plurality of lanes indicating a memory address which stores the instruction to be fetched next for the lane when the lane is activated, and a lane activation bit indicating whether the lane is active or not; incrementing values of the lane depth counters of all active lanes of the plurality of lanes upon the thread-PC value reaching a branch operation in the instruction stream; updating the lane-PC value of each of the active lanes according to targets of the branch operation; and selecting at least one active lane from the plurality of lanes and assigning a corresponding lane-PC value of the selected at least one active lane to the thread-PC, and activating only lanes of the plurality of lanes whose lane-PC values are equal to the thread-PC value; decrementing the lane depth counter value of the selected at least one active lane and updating the lane-PC value of each of the plurality of active lanes upon the instruction stream reaching a first instruction; and assigning the lane-PC value of a lane of the plurality of lanes with a largest lane depth counter value to the thread-PC and activating all lanes of the plurality of lanes whose lane-PC values are equal to the thread-PC value, wherein a plurality of processors coupled to one or more memory devices perform the running, the incrementing, the assigning, the activating and the decrementing until the thread-PC reaches an end of the instruction stream and the lane-PC values of all of the plurality of lanes are equal to with the thread-PC value. 2. The method according to claim 1 , wherein the branch operation comprises: a function call operation, an if and else logic operation, and an iteration operation encountered by running the instruction stream. 3. The method according to claim 2 , further comprising: incrementing the values of the lane depth counters of all the active lanes of the plurality of lanes when the instruction stream reaches a second instruction. 4. The method according to claim 1 , further comprising: initializing the values the of lane depth counters of all of the plurality of lanes to zero; initializing the values of the lane-PCs of all of the plurality of lanes to an instruction memory address; and activating one or more lanes of the plurality of lanes whose lane-PC values are set to instruction memory address. 5. The method according to claim 1 , further comprising: after activating one or more lanes of the plurality of lanes, assigning a minimum value of values of lane depth counters of all another values of all another active lanes of plurality of lanes to the values of the lane depth counters of all the another active lanes. 6. The method according to claim 3 , further comprising: generating and inserting, by a compiler, the first instruction and the second instruction. 7. The method according to claim 3 , wherein the incrementing comprises: splitting the branch operation into two second instructions and directing a backward branch to one of the two second instructions. 8. The method according to claim 6 , further comprising: generating the first instruction at an exit point of the branch operation by the compiler. 9. The method according to claim 7 , wherein another of the two second instructions specifies a positive increment value. 10. The method according to claim 7 , wherein the one of the two second instructions specifies a zero increment value. 11. A system for selecting an active data stream while running a SPMD (Single Program Multiple Data) program of instructions on a SIMD (Single Instruction Multiple Data) machine, an instruction stream having one thread-PC (Program Counter), a value of the thread-PC indicating an instruction memory address which stores an instruction to be fetched next for the instruction stream, the system comprising: a plurality of processors; a memory device coupled to the plurality of processors, wherein the plurality of processors are configured to perform: running the instruction stream over a plurality of input data streams each of which corresponds to each of a plurality of lanes, each of the plurality of lanes being associated with a corresponding lane depth counter indicating a number of nested branches for each corresponding lane, a corresponding lane-PC of a lane of the plurality of lanes indicating a memory address which stores the instruction to be fetched next for the lane when the lane is activated, and a lane activation bit indicating whether the lane is active or not; incrementing values of the lane depth counters of all active lanes of the plurality of lanes upon the thread-PC value reaching a branch operation in the instruction stream; updating the lane-PC value of each of the active lanes according to targets of the branch operation; and selecting at least one active lane from of the plurality of lanes and assigning a corresponding lane-PC value of the selected at least one active lane to the thread-PC, and activating only lanes of the plurality of lanes whose lane-PC values are equal to the thread-PC value; decrementing the lane depth counter values of the selected at least one active lane and updating the lane-PC value of each of the plurality of active lanes upon the instruction stream reaching a first instruction; and assigning the lane-PC value of a lane of the plurality of lanes with a largest lane depth counter value to the thread-PC and activating all lanes of the plurality of lanes whose lane-PC values are equal to the thread-PC value, wherein a plurality of processors coupled to one or more memory devices perform the running, the incrementing, the assigning, the activating and the decrementing until the thread-PC reaches an end of the instruction stream and the lane-PC values of all of the plurality of lanes are equal to with the thread-PC value. 12. The system according to claim 11 , wherein the branch operation comprises: a function call operation, an if and else logic operation, and an iteration operation encountered by running the instruction streams. 13. The system according to claim 12 , wherein the plurality of processors are configured to perform: incrementing the values of the lane depth counters of all of the active lanes of the plurality of lanes when the instruction steam reaches a second instruction. 14. The system according to claim 11 , wherein in order to perform the running, the plurality of processors are configured to perform: initializing the values of the lane depth counters of all of the plurality of lanes to zero; initializing the values of the lane-PCs of all of the plurality of lanes to an instruction memory address; and activating one or more lanes of the plurality of lanes whose lane-PC values are set to the instruction memory address. 15. The system according to claim 11 , the plurality of processors are further configured to perform: after activating one or more lanes of the plurality o

Assignees

Inventors

Classifications

  • Two dimensional arrays, e.g. mesh, torus · CPC title

  • using a plurality of independent parallel functional units · CPC title

  • Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title

  • single instruction multiple data [SIMD] multiprocessors · CPC title

  • comprising an array of processing units with common control, e.g. single instruction multiple data processors (G06F15/82 takes precedence {; for correlation function computation G06F17/15}) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9952876B2 cover?
There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. T…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F9/3887. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 24 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).