Cooperative thread array granularity context switch during trap handling

US10289418B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10289418-B2
Application numberUS-201213728784-A
CountryUS
Kind codeB2
Filing dateDec 27, 2012
Priority dateDec 27, 2012
Publication dateMay 14, 2019
Grant dateMay 14, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for executing a first trapping instruction encountered by a thread executing within a processing core, the method comprising: determining that a first thread included in a first thread group executing within the processing core is executing the first trapping instruction, wherein the first thread group is included in a plurality of thread groups executing within the processing core; in response to determining that the first thread is executing the first trapping instruction, pausing execution of the first thread group for a predetermined number of instruction cycles; for each thread group included in the plurality of thread groups executing within the processing core, entering a trap handler routine after the first thread group has been paused for the predetermined number of instruction cycles; for each thread group included in the plurality of thread groups, determining whether a corresponding cooperative thread array (CTA) identifier is associated with a CTA that includes the first thread; and if the corresponding CTA identifier is not associated with the CTA that includes the first thread, then exiting the trap handler routine, or if the corresponding CTA identifier is associated with the CTA that includes the first thread, then: disabling interrupts, after disabling interrupts, executing one or more critical instructions to update a first entry included in a trap table; enabling interrupts, and after enabling interrupts, executing one or more operations associated with the trap handler routine prior to exiting the trap handler routine. 2. The method of claim 1 , wherein if the corresponding CTA identifier is associated with the CTA that includes the first thread, then further comprising: storing a first portion of a context associated with the first thread group in a memory; prior to storing a second portion of the context associated with the first thread group, indicating that each thread group included in the plurality of thread groups may resume execution outside of the trap hander routine; and storing the second portion of the context associated with the first thread group in the memory. 3. The method of claim 2 , further comprising: determining that each thread group within the CTA that includes the first thread has executed one or more operations associated with the trap handler routine; and removing each thread group within the CTA that includes the first thread from active execution in the processing core. 4. The method of claim 2 , further comprising: determining that all thread groups executing within the processing core have indicated that each thread group included in the plurality of thread groups may resume execution outside of the trap hander routine; and causing the first thread group to resume execution outside of the trap hander routine. 5. The method of claim 2 , further comprising: determining that the first thread group includes the thread that encountered the first trapping instruction; and executing one or more instructions prior to storing the first portion of the context. 6. The method of claim 5 , wherein determining that the first thread group includes the thread that encountered the first trapping instruction comprises: retrieving an entry associated with the first thread group from a data structure comprising trap information for each thread group included in a plurality of thread groups; and determining that the first thread has updated the entry associated with the first thread group. 7. The method of claim 2 , further comprising: prior to executing the one or more operations associated with the trap handler routine, waiting for a predetermined period of time; and determining that a second thread included in one of the thread groups executing a second trapping instruction. 8. The method of claim 1 , wherein the first entry included in the trap table comprises any one or more of a trap reason, a location of a user-specified save routine, a location of a context buffer, and an identifier associated with the first thread. 9. The method of claim 1 , wherein the predetermined number of instruction cycles is configurable via a privileged register setting. 10. The method of claim 1 , further comprising, while pausing execution of the first thread group for the predetermined number of instruction cycles, determining that another thread encounters a second trapping instruction, the another thread also included in another thread group included in the plurality of thread groups executing within the processing core. 11. The method of claim 10 , wherein the first trapping instruction and the second trapping instruction are processed by the trap handler routine after the first thread group is paused for the predetermined number of instruction cycles. 12. A non-transitory computer readable storage medium comprising instructions that cause a computer system to carry out a method for executing a first trapping instruction encountered by a thread executing within a processing core, comprising the steps of: determining that a first thread included in a first thread group executing within the processing core is executing the first trapping instruction, wherein the first thread group is included in a plurality of thread groups executing within the processing core; in response to determining that the first thread is executing the first trapping instruction, pausing execution of the first thread group for a predetermined number of instruction cycles; for each thread group included in the plurality of thread groups executing within the processing core, entering a trap handler routine after the first thread group has been paused for the predetermined number of instruction cycles; for each thread group included in the plurality of thread groups, determining whether a corresponding cooperative thread array (CTA) identifier is associated with a CTA that includes the first thread; and if the corresponding CTA identifier is not associated with the CTA that includes the first thread, then exiting the trap handler routine, or if the corresponding CTA identifier is associated with the CTA that includes the first thread, then: disabling interrupts, after disabling interrupts, executing one or more critical instructions to update a first entry included in a trap table; enabling interrupts, and after enabling interrupts, executing one or more operations associated with the trap handler routine prior to exiting the trap handler routine. 13. The non-transitory computer readable storage medium of claim 12 , wherein if the corresponding CTA identifier is associated with the CTA that includes the first thread, then the method further comprises: storing a first portion of a context associated with the first thread group in a memory; prior to storing a second portion of the context associated with the first thread group, indicating that each thread group included in the plurality of thread groups may resume execution outside of the trap hander routine; and storing the second portion of the context associated with the first thread group in the memory. 14. The non-transitory computer readable storage medium of claim 13 , wherein the method further comprises: determining that each thread group within the CTA that includes the first thread has executed one or more operations associated with the trap handler routine; and removing each thread group within the CTA that includes the first thread from active execution in the processing core. 15. The non-transitory computer readable storage medium of claim 13 , wherein the method further comprise

Assignees

Inventors

Classifications

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title

  • G06F9/3851Primary

    from multiple instruction streams, e.g. multistreaming · CPC title

  • by interrupt, e.g. masked · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10289418B2 cover?
Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handlin…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 14 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).