Techniques for parallel execution

US12056494B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12056494-B2
Application numberUS-202117239376-A
CountryUS
Kind codeB2
Filing dateApr 23, 2021
Priority dateApr 23, 2021
Publication dateAug 6, 2024
Grant dateAug 6, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatuses, systems, and techniques to identify instructions for advanced execution. In at least one embodiment, a processor performs one or more instructions that have been identified by a compiler to be speculatively performed in parallel.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: one or more circuits to perform a compiler, wherein the compiler is to cause graphics processing unit (GPU) code to be speculatively performed based, at least in part, on whether two or more central processing unit (CPU) code branch conditions are dependent on one another. 2. The processor of claim 1 , wherein one or more instructions of the GPU code have been identified to be speculatively performed in parallel by the compiler based, at least in part, on identifying copy operations, and the one or more circuits are to cause one or more GPUs to perform the identified one or more instructions based, at least in part, on receiving a command from another processor. 3. The processor of claim 1 , wherein one or more instructions of the GPU code have been identified to be speculatively performed in parallel by the compiler based, at least in part, on identifying copy operations between a parallel processing unit and a host computer system, and labeling safe operations following one or more identified copy operations. 4. The processor of claim 1 , wherein one or more instructions of the GPU code include extended live ranges for variables used by operations associated with instructions identified by the compiler to be speculatively performed in parallel. 5. The processor of claim 1 , wherein the one or more circuits are to cause one or more GPUs to perform the GPU code after receiving one or more kernel launch commands from a host computer system. 6. The processor of claim 1 , wherein the GPU code is part of a while loop. 7. The processor of claim 1 , wherein the GPU code is in a set of instructions that follows a first value of a branch condition, and that does not follow a second value of the branch condition. 8. A system, comprising: one or more processors to perform a compiler, wherein the compiler is to cause graphics processing unit (GPU) code to be speculatively performed based, at least in part, on whether two or more central processing unit (CPU) code branch conditions are dependent on one another; and one or more memories to store the GPU code. 9. The system of claim 8 , wherein one or more instructions of the GPU code have been identified to be speculatively performed in parallel by the compiler based, at least in part, on identifying copy operations to a host computer system. 10. The system of claim 8 , wherein one or more instructions of the GPU code have been identified to be speculatively performed in parallel by the compiler based, at least in part, on finding one or more conditional branches in a representation of a computer program that uses a neural network. 11. The system of claim 8 , wherein the one or more processors are to launch the GPU code for performance by one or more GPUs. 12. The system of claim 8 , wherein the one or more processors are a first one or more processors, the system further comprises a second one or more processors to speculatively launch one or more instructions of the GPU code for performance by one or more GPUs, and the second one or more processors are to stop launching instructions speculatively in response to receiving a value via a copy operation that satisfies a condition preceding the one or more instructions in a representation of a computer program. 13. The system of claim 8 , wherein instructions have been identified to be speculatively performed in parallel by the compiler based, at least in part, on labeling operations that are safe to be speculatively performed. 14. The system of claim 8 , wherein instructions have been identified to be speculatively performed in parallel by the compiler based, at least in part, on searching a representation of a computer program for copy operations, and identifying operations that follow the copy operations that are safe to be speculatively performed. 15. The system of claim 8 , wherein the GPU code is part of a while loop that implements a portion of an inferencing operation using a neural network. 16. A method, comprising: performing, by one or more processors, a compiler, wherein the compiler is to cause graphics processing unit (GPU) code to be speculatively performed based, at least in part, on whether two or more central processing unit (CPU) code branch conditions are dependent on one another. 17. The method of claim 16 , wherein the GPU code has been identified to be speculatively performed by the compiler based, at least in part, on identifying operations that do not change a random state, overwrite outputs, use a signal instruction, or use a wait instruction. 18. The method of claim 16 , wherein the GPU code has been identified to be speculatively performed by the compiler based, at least in part, on identifying a conditional branch and selecting a path from a plurality of paths following the conditional branch. 19. The method of claim 16 , wherein one or more of the GPU code has been identified to be speculatively performed by the compiler based, at least in part, on identifying copy operations. 20. The method of claim 16 , wherein the GPU code includes extended live ranges for variables used in speculatively performed operations. 21. The method of claim 16 , wherein the compiler is to identify the GPU code to be speculatively performed based, at least in part, on identifying copy operations, and wherein the GPU code implements a portion of an inferencing operation using a neural network. 22. A non-transitory machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: perform a compiler, wherein the compiler is to cause graphics processing unit (GPU) code to be speculatively performed based, at least in part, on whether two or more central processing unit (CPU) code branch conditions are dependent on one another. 23. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or more processors, further cause the one or more processors to perform the compiler to at least identify one or more of instructions of the GPU code to be speculatively performed in parallel based, at least in part, on identifying copy operations between a parallel processing unit and a host computer system in a representation of a computer program. 24. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or more processors, further cause the one or more processors to perform the compiler to at least identify operations following a copy operation that are safe to execute. 25. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or more processors, further cause the one or more processors to at least perform the compiler to label operations that are safe to be speculatively performed. 26. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or more processors, further cause the one or more processors to perform the compiler to at least label operations that are safe to be speculatively performed and extend a live range of variables associated with operations labeled safe to be speculatively performed. 27. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12056494B2 cover?
Apparatuses, systems, and techniques to identify instructions for advanced execution. In at least one embodiment, a processor performs one or more instructions that have been identified by a compiler to be speculatively performed in parallel.
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F8/4441. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).