Who is the assignee on this patent?

Fetterman Michael, Carlton Stewart Glenn, Choquette Jack Hilaire, and 10 more

What technology area does this patent fall under?

Primary CPC classification G06F9/3861. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Pre-scheduled replays of divergent operations

US10152329B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10152329-B2
Application number	US-201213370173-A
Country	US
Kind code	B2
Filing date	Feb 9, 2012
Priority date	Feb 9, 2012
Publication date	Dec 11, 2018
Grant date	Dec 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for pre-scheduling replay of a common resource access operation, the method comprising: receiving a first instruction that is to be executed by a group of threads in a multi-stage pipeline; prior to inserting the first instruction into the multi-stage pipeline for execution, determining that a pre-scheduled replay operation should be inserted into the multi-stage pipeline; selecting a first set of one or more threads from the group of threads to execute the first instruction in the multi-stage pipeline; inserting the first instruction into the multi-stage pipeline for execution by the first set of one or more threads during a first pass through the multi-stage pipeline; prior to inserting the pre-scheduled replay operation into the multi-stage pipeline, executing the first instruction in the multi-stage pipeline via the first set of one or more threads; and while the first set of one or more threads is executing the first instruction in the multi-stage pipeline, inserting the pre-scheduled replay operation into the multi-stage pipeline to allow a second set of one or more threads from the group of threads to execute the first instruction during the first pass through the multi-stage pipeline, wherein the first set of one or more threads is intended to access a first aspect or portion of a common resource, and the second set of one or more threads is intended to access a second aspect or portion of the common resource. 2. The method of claim 1 , wherein the common resource comprises a memory cache. 3. The method of claim 1 , further comprising inserting into the multi-stage pipeline an identifier corresponding to the pre-scheduled replay operation, wherein the identifier indicates the existence of one or more pre-scheduled replay operations. 4. The method of claim 1 , wherein the pre-scheduled replay operation is inserted into the multi-stage pipeline serially relative to the first instruction. 5. The method of claim 1 , wherein a second instruction is inserted into the multi-stage pipeline serially relative to the pre-scheduled replay operation. 6. The method of claim 1 , wherein the first instruction indicates that at least one pre-scheduled replay operation should be inserted into the multi-stage pipeline. 7. The method of claim 1 , wherein the pre-scheduled replay operation is required to execute the first instruction for all threads within the group of threads. 8. The method of claim 1 , wherein the pre-scheduled replay operation is likely to be required to execute the first instruction for all threads within the group of threads. 9. A subsystem for pre-scheduling replay of a common resource access operation, comprising: a streaming multiprocessor configured to: receive a first instruction that is to be executed by a group of threads in a multi-stage pipeline; prior to inserting the first instruction into the multi-stage pipeline for execution, determine that a pre-scheduled replay operation should be inserted into the multi-stage pipeline; select a first set of one or more threads from the group of threads to execute the first instruction in the multi-stage pipeline; insert the first instruction into the multi-stage pipeline for execution by the first set of one or more threads during a first pass through the multi-stage pipeline; prior to inserting the pre-scheduled replay operation into the multi-stage pipeline, executing the first instruction in the multi-stage pipeline via the first set of one or more threads; and while the first set of one or more threads is executing the first instruction in the multi-stage pipeline, insert the pre-scheduled replay operation into the multi-stage pipeline to allow a second set of one or more threads from the group of threads to execute the first instruction during the first pass through the multi-stage pipeline, wherein the first set of one or more threads is intended to access a first aspect or portion of a common resource, and the second set of one or more threads is intended to access a second aspect or portion of the common resource. 10. The subsystem of claim 9 , wherein the common resource comprises a memory cache. 11. The subsystem of claim 9 , further comprising inserting into the multi-stage pipeline an identifier corresponding to the pre-scheduled replay operation, wherein the identifier indicates the existence of one or more pre-scheduled replay operations. 12. The subsystem of claim 9 , wherein the pre-scheduled replay operation is inserted into the multi-stage pipeline serially relative to the first instruction. 13. The subsystem of claim 9 , wherein a second instruction is inserted into the multi-stage pipeline serially relative to the pre-scheduled replay operation. 14. The subsystem of claim 9 , wherein the first instruction indicates that at least one pre-scheduled replay operation should be inserted into the multi-stage pipeline. 15. The subsystem of claim 9 , wherein the pre-scheduled replay operation is required to execute the first instruction for all threads within the group of threads. 16. The subsystem of claim 9 , wherein the pre-scheduled replay operation is likely to be required to execute the first instruction for all threads within the group of threads. 17. A computing device, comprising: a subsystem that includes a streaming multiprocessor configured to: receive a first instruction that is to be executed by a group of threads in a multi-stage pipeline; prior to inserting the first instruction into the multi-stage pipeline for execution, determine that a pre-scheduled replay operation should be inserted into the multi-stage pipeline select a first set of one or more threads from the group of threads to execute the first instruction in the multi-stage pipeline; insert the first instruction into the multi-stage pipeline for execution by the first set of one or more threads during a first pass through the multi-stage pipeline; prior to inserting the pre-scheduled replay operation into the multi-stage pipeline, executing the first instruction in the multi-stage pipeline via the first set of one or more threads; and while the first set of one or more threads is executing the first instruction in the multi-stage pipeline, insert the pre-scheduled replay operation into the multi-stage pipeline to allow a second set of one or more threads from the group of threads to execute the first instruction during the first pass through the multi-stage pipeline, wherein the first set of one or more threads is intended to access a first aspect or portion of a common resource, and the second set of one or more threads is intended to access a second aspect or portion of the common resource. 18. The computing device of claim 17 , wherein the first instruction indicates that at least one pre-scheduled replay operation should be inserted into the multi-stage pipeline. 19. The computing device of claim 17 , wherein the pre-scheduled replay operation is required to execute the first instruction for all threads within the group of threads. 20. The computing device of claim 17 , wherein the pre-scheduled replay operation is likely to be required to execute the first instruction for all threads within the group of threads. 21. The method of claim 1 , wherein the second set of one or more threads remains unserviced after the first set of one or more threads executes the first instruction.

Assignees

Inventors

Classifications

G06F9/3836
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
G06F9/3887
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
G06F9/3851
from multiple instruction streams, e.g. multistreaming · CPC title
G06F9/3861Primary
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title
G06F9/3888
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

Patent family

Related publications grouped by family.

View patent family 48868444

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10152329B2 cover?: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associ…
Who is the assignee on this patent?: Fetterman Michael, Carlton Stewart Glenn, Choquette Jack Hilaire, and 10 more
What technology area does this patent fall under?: Primary CPC classification G06F9/3861. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).