Pre-scheduled replays of divergent operations

US10152329B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10152329-B2
Application numberUS-201213370173-A
CountryUS
Kind codeB2
Filing dateFeb 9, 2012
Priority dateFeb 9, 2012
Publication dateDec 11, 2018
Grant dateDec 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for pre-scheduling replay of a common resource access operation, the method comprising: receiving a first instruction that is to be executed by a group of threads in a multi-stage pipeline; prior to inserting the first instruction into the multi-stage pipeline for execution, determining that a pre-scheduled replay operation should be inserted into the multi-stage pipeline; selecting a first set of one or more threads from the group of threads to execute the first instruction in the multi-stage pipeline; inserting the first instruction into the multi-stage pipeline for execution by the first set of one or more threads during a first pass through the multi-stage pipeline; prior to inserting the pre-scheduled replay operation into the multi-stage pipeline, executing the first instruction in the multi-stage pipeline via the first set of one or more threads; and while the first set of one or more threads is executing the first instruction in the multi-stage pipeline, inserting the pre-scheduled replay operation into the multi-stage pipeline to allow a second set of one or more threads from the group of threads to execute the first instruction during the first pass through the multi-stage pipeline, wherein the first set of one or more threads is intended to access a first aspect or portion of a common resource, and the second set of one or more threads is intended to access a second aspect or portion of the common resource. 2. The method of claim 1 , wherein the common resource comprises a memory cache. 3. The method of claim 1 , further comprising inserting into the multi-stage pipeline an identifier corresponding to the pre-scheduled replay operation, wherein the identifier indicates the existence of one or more pre-scheduled replay operations. 4. The method of claim 1 , wherein the pre-scheduled replay operation is inserted into the multi-stage pipeline serially relative to the first instruction. 5. The method of claim 1 , wherein a second instruction is inserted into the multi-stage pipeline serially relative to the pre-scheduled replay operation. 6. The method of claim 1 , wherein the first instruction indicates that at least one pre-scheduled replay operation should be inserted into the multi-stage pipeline. 7. The method of claim 1 , wherein the pre-scheduled replay operation is required to execute the first instruction for all threads within the group of threads. 8. The method of claim 1 , wherein the pre-scheduled replay operation is likely to be required to execute the first instruction for all threads within the group of threads. 9. A subsystem for pre-scheduling replay of a common resource access operation, comprising: a streaming multiprocessor configured to: receive a first instruction that is to be executed by a group of threads in a multi-stage pipeline; prior to inserting the first instruction into the multi-stage pipeline for execution, determine that a pre-scheduled replay operation should be inserted into the multi-stage pipeline; select a first set of one or more threads from the group of threads to execute the first instruction in the multi-stage pipeline; insert the first instruction into the multi-stage pipeline for execution by the first set of one or more threads during a first pass through the multi-stage pipeline; prior to inserting the pre-scheduled replay operation into the multi-stage pipeline, executing the first instruction in the multi-stage pipeline via the first set of one or more threads; and while the first set of one or more threads is executing the first instruction in the multi-stage pipeline, insert the pre-scheduled replay operation into the multi-stage pipeline to allow a second set of one or more threads from the group of threads to execute the first instruction during the first pass through the multi-stage pipeline, wherein the first set of one or more threads is intended to access a first aspect or portion of a common resource, and the second set of one or more threads is intended to access a second aspect or portion of the common resource. 10. The subsystem of claim 9 , wherein the common resource comprises a memory cache. 11. The subsystem of claim 9 , further comprising inserting into the multi-stage pipeline an identifier corresponding to the pre-scheduled replay operation, wherein the identifier indicates the existence of one or more pre-scheduled replay operations. 12. The subsystem of claim 9 , wherein the pre-scheduled replay operation is inserted into the multi-stage pipeline serially relative to the first instruction. 13. The subsystem of claim 9 , wherein a second instruction is inserted into the multi-stage pipeline serially relative to the pre-scheduled replay operation. 14. The subsystem of claim 9 , wherein the first instruction indicates that at least one pre-scheduled replay operation should be inserted into the multi-stage pipeline. 15. The subsystem of claim 9 , wherein the pre-scheduled replay operation is required to execute the first instruction for all threads within the group of threads. 16. The subsystem of claim 9 , wherein the pre-scheduled replay operation is likely to be required to execute the first instruction for all threads within the group of threads. 17. A computing device, comprising: a subsystem that includes a streaming multiprocessor configured to: receive a first instruction that is to be executed by a group of threads in a multi-stage pipeline; prior to inserting the first instruction into the multi-stage pipeline for execution, determine that a pre-scheduled replay operation should be inserted into the multi-stage pipeline select a first set of one or more threads from the group of threads to execute the first instruction in the multi-stage pipeline; insert the first instruction into the multi-stage pipeline for execution by the first set of one or more threads during a first pass through the multi-stage pipeline; prior to inserting the pre-scheduled replay operation into the multi-stage pipeline, executing the first instruction in the multi-stage pipeline via the first set of one or more threads; and while the first set of one or more threads is executing the first instruction in the multi-stage pipeline, insert the pre-scheduled replay operation into the multi-stage pipeline to allow a second set of one or more threads from the group of threads to execute the first instruction during the first pass through the multi-stage pipeline, wherein the first set of one or more threads is intended to access a first aspect or portion of a common resource, and the second set of one or more threads is intended to access a second aspect or portion of the common resource. 18. The computing device of claim 17 , wherein the first instruction indicates that at least one pre-scheduled replay operation should be inserted into the multi-stage pipeline. 19. The computing device of claim 17 , wherein the pre-scheduled replay operation is required to execute the first instruction for all threads within the group of threads. 20. The computing device of claim 17 , wherein the pre-scheduled replay operation is likely to be required to execute the first instruction for all threads within the group of threads. 21. The method of claim 1 , wherein the second set of one or more threads remains unserviced after the first set of one or more threads executes the first instruction.

Assignees

Inventors

Classifications

  • Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • G06F9/3861Primary

    Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10152329B2 cover?
One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associ…
Who is the assignee on this patent?
Fetterman Michael, Carlton Stewart Glenn, Choquette Jack Hilaire, and 10 more
What technology area does this patent fall under?
Primary CPC classification G06F9/3861. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).