Batched replays of divergent operations

US9817668B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9817668-B2
Application numberUS-201113329066-A
CountryUS
Kind codeB2
Filing dateDec 16, 2011
Priority dateDec 16, 2011
Publication dateNov 14, 2017
Grant dateNov 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for replaying a shared resource access operation, the method comprising: selecting a first thread and a second thread from a group of threads configured to simultaneously execute an instruction in a multi-stage pipeline, wherein neither the first thread nor the second thread has yet executed the instruction after one or more execution operations of the instruction for the group of threads, the first thread accesses a first shared resource, the second thread accesses a second shared resource, and the first shared resource and the second shared resource are not retrievable during the same execution operation; selecting a first set of threads to associate with the first thread; selecting a second set of threads to associate with the second thread; inserting a first replay operation associated with the first thread and the first set of threads into the multi-stage pipeline to execute the instruction; and inserting a second replay operation associated with the second thread and the second set of threads into the multi-stage pipeline to execute the instruction, wherein the second replay operation is inserted into the multi-stage pipeline serially relative to the first replay operation. 2. The method of claim 1 , wherein the first thread and the first set of threads share no common thread with the second thread and the second set of threads. 3. The method of claim 1 , wherein the first set of threads is selected concurrently with the selection of the first thread, and the second set of threads is selected concurrently with the selection of the second thread. 4. The method of claim 1 , wherein the second replay operation associated with the second thread and the second set of threads is inserted into the multi-stage pipeline one pipeline stage subsequent to the first replay operation associated with the first thread and the first set of threads. 5. The method of claim 1 , wherein the first thread and each thread in the first set of threads are configured to access the first shared resource, and the second thread and each thread in the second set of threads are configured to access the second shared resource. 6. The method of claim 5 , wherein the first replay operation associated with the first thread and the first set of threads and the second replay operation associated with the second thread and the second set of threads are inserted into the multi-stage pipeline via a first pipeline stage of the multi-stage pipeline that is subsequent to a second pipeline stage of the multi-stage pipeline. 7. The method of claim 5 , wherein a second instruction is inserted into the multi-stage pipeline serially relative to the second replay operation associated with the second thread and the second set of threads. 8. A subsystem for replaying a shared resource access operation, comprising: a load-store unit (LSU) configured to: select a first thread and a second thread from a group of threads configured to simultaneously execute an instruction in a multi-stage pipeline, wherein neither the first thread nor the second thread has yet executed the instruction after one or more execution operations of the instruction for the group of threads, the first thread accesses a first shared resource, the second thread accesses a second shared resource, and the first shared resource and the second shared resource are not retrievable during the same execution operation; select a first set of threads to associate with the first thread; select a second set of threads to associate with the second thread; insert a first replay operation associated with the first thread and the first set of threads into the multi-stage pipeline to execute the instruction; and insert a second replay operation associated with the second thread and the second set of threads into the multi-stage pipeline to execute the instruction, wherein the second replay operation is inserted into the multi-stage pipeline serially relative to the first replay operation. 9. The subsystem of claim 8 , wherein the first thread and the first set of threads share no common thread with the second thread and the second set of threads. 10. The subsystem of claim 8 , wherein the first set of threads is selected concurrently with the selection of the first thread, and the second set of threads is selected concurrently with the selection of the second thread. 11. The subsystem of claim 8 , wherein the second replay operation associated with the second thread and the second set of threads is inserted into the multi-stage pipeline one pipeline stage subsequent to the first replay operation associated with the first thread and the first set of threads. 12. The subsystem of claim 8 , wherein the first thread and each thread in the first set of threads are configured to access the first shared resource, and the second thread and each thread in the second set of threads are configured to access the second shared resource. 13. The subsystem of claim 12 , wherein the first replay operation associated with the first thread and the first set of threads and the second replay operation associated with the second thread and the second set of threads are inserted into the multi-stage pipeline via a first pipeline stage of the multi-stage pipeline that is subsequent to a second pipeline stage of the multi-stage pipeline. 14. The subsystem of claim 12 , wherein a second instruction is inserted into the multi-stage pipeline serially relative to the second replay operation associated with the second thread and the second set of threads. 15. A computing device, comprising: a subsystem that includes a load-store unit (LSU) configured to: select a first thread and a second thread from a group of threads configured to simultaneously execute an instruction in a multi-stage pipeline, wherein neither the first thread nor the second thread has yet executed the instruction after one or more execution operations of the instruction for the group of threads, the first thread accesses a first shared resource, the second thread accesses a second shared resource, and the first shared resource and the second shared resource are not retrievable during the same execution operation; select a first set of threads to associate with the first thread; select a second set of threads to associate with the second thread; insert a first replay operation associated with the first thread and the first set of threads into the multi-stage pipeline to execute the instruction; and insert a second replay operation associated with the second thread and the second set of threads into the multi-stage pipeline to execute the instruction, wherein the second replay operation is inserted into the multi-stage pipeline serially relative to the replay operation. 16. The computing device of claim 15 , wherein the first thread and each thread in the first set of threads are configured to access the first shared resource, and the second thread and each thread in the second set of threads are configured to access the second shared resource. 17. The computing device of claim 16 , wherein the first replay operation associated with the first thread and the first set of threads and the second replay operation associated with the second thread and the second set of threads are inserted into the multi-stage pipeline via a first pipeline stage of the multi-stage pipeline that is subsequent to a second pipeline stage of the multi-stage pipeline. 18. The computing device of claim 16 , wherein a second instruction is inserted into the multi-stage pipeline

Assignees

Inventors

Classifications

  • Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title

  • G06F9/3851Primary

    from multiple instruction streams, e.g. multistreaming · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9817668B2 cover?
One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipel…
Who is the assignee on this patent?
Fetterman Michael, Choquette Jack Hilaire, Paranjape Omkar, and 7 more
What technology area does this patent fall under?
Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).