What technology area does this patent fall under?

Primary CPC classification G06F8/4441. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Techniques for parallel execution

US12056494B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12056494-B2
Application number	US-202117239376-A
Country	US
Kind code	B2
Filing date	Apr 23, 2021
Priority date	Apr 23, 2021
Publication date	Aug 6, 2024
Grant date	Aug 6, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatuses, systems, and techniques to identify instructions for advanced execution. In at least one embodiment, a processor performs one or more instructions that have been identified by a compiler to be speculatively performed in parallel.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: one or more circuits to perform a compiler, wherein the compiler is to cause graphics processing unit (GPU) code to be speculatively performed based, at least in part, on whether two or more central processing unit (CPU) code branch conditions are dependent on one another. 2. The processor of claim 1 , wherein one or more instructions of the GPU code have been identified to be speculatively performed in parallel by the compiler based, at least in part, on identifying copy operations, and the one or more circuits are to cause one or more GPUs to perform the identified one or more instructions based, at least in part, on receiving a command from another processor. 3. The processor of claim 1 , wherein one or more instructions of the GPU code have been identified to be speculatively performed in parallel by the compiler based, at least in part, on identifying copy operations between a parallel processing unit and a host computer system, and labeling safe operations following one or more identified copy operations. 4. The processor of claim 1 , wherein one or more instructions of the GPU code include extended live ranges for variables used by operations associated with instructions identified by the compiler to be speculatively performed in parallel. 5. The processor of claim 1 , wherein the one or more circuits are to cause one or more GPUs to perform the GPU code after receiving one or more kernel launch commands from a host computer system. 6. The processor of claim 1 , wherein the GPU code is part of a while loop. 7. The processor of claim 1 , wherein the GPU code is in a set of instructions that follows a first value of a branch condition, and that does not follow a second value of the branch condition. 8. A system, comprising: one or more processors to perform a compiler, wherein the compiler is to cause graphics processing unit (GPU) code to be speculatively performed based, at least in part, on whether two or more central processing unit (CPU) code branch conditions are dependent on one another; and one or more memories to store the GPU code. 9. The system of claim 8 , wherein one or more instructions of the GPU code have been identified to be speculatively performed in parallel by the compiler based, at least in part, on identifying copy operations to a host computer system. 10. The system of claim 8 , wherein one or more instructions of the GPU code have been identified to be speculatively performed in parallel by the compiler based, at least in part, on finding one or more conditional branches in a representation of a computer program that uses a neural network. 11. The system of claim 8 , wherein the one or more processors are to launch the GPU code for performance by one or more GPUs. 12. The system of claim 8 , wherein the one or more processors are a first one or more processors, the system further comprises a second one or more processors to speculatively launch one or more instructions of the GPU code for performance by one or more GPUs, and the second one or more processors are to stop launching instructions speculatively in response to receiving a value via a copy operation that satisfies a condition preceding the one or more instructions in a representation of a computer program. 13. The system of claim 8 , wherein instructions have been identified to be speculatively performed in parallel by the compiler based, at least in part, on labeling operations that are safe to be speculatively performed. 14. The system of claim 8 , wherein instructions have been identified to be speculatively performed in parallel by the compiler based, at least in part, on searching a representation of a computer program for copy operations, and identifying operations that follow the copy operations that are safe to be speculatively performed. 15. The system of claim 8 , wherein the GPU code is part of a while loop that implements a portion of an inferencing operation using a neural network. 16. A method, comprising: performing, by one or more processors, a compiler, wherein the compiler is to cause graphics processing unit (GPU) code to be speculatively performed based, at least in part, on whether two or more central processing unit (CPU) code branch conditions are dependent on one another. 17. The method of claim 16 , wherein the GPU code has been identified to be speculatively performed by the compiler based, at least in part, on identifying operations that do not change a random state, overwrite outputs, use a signal instruction, or use a wait instruction. 18. The method of claim 16 , wherein the GPU code has been identified to be speculatively performed by the compiler based, at least in part, on identifying a conditional branch and selecting a path from a plurality of paths following the conditional branch. 19. The method of claim 16 , wherein one or more of the GPU code has been identified to be speculatively performed by the compiler based, at least in part, on identifying copy operations. 20. The method of claim 16 , wherein the GPU code includes extended live ranges for variables used in speculatively performed operations. 21. The method of claim 16 , wherein the compiler is to identify the GPU code to be speculatively performed based, at least in part, on identifying copy operations, and wherein the GPU code implements a portion of an inferencing operation using a neural network. 22. A non-transitory machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: perform a compiler, wherein the compiler is to cause graphics processing unit (GPU) code to be speculatively performed based, at least in part, on whether two or more central processing unit (CPU) code branch conditions are dependent on one another. 23. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or more processors, further cause the one or more processors to perform the compiler to at least identify one or more of instructions of the GPU code to be speculatively performed in parallel based, at least in part, on identifying copy operations between a parallel processing unit and a host computer system in a representation of a computer program. 24. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or more processors, further cause the one or more processors to perform the compiler to at least identify operations following a copy operation that are safe to execute. 25. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or more processors, further cause the one or more processors to at least perform the compiler to label operations that are safe to be speculatively performed. 26. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or more processors, further cause the one or more processors to perform the compiler to at least label operations that are safe to be speculatively performed and extend a live range of variables associated with operations labeled safe to be speculatively performed. 27. The non-transitory machine-readable medium of claim 22 , wherein the set of instructions, which if performed by the one or

Assignees

Nvidia Corp

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06F9/3851
from multiple instruction streams, e.g. multistreaming · CPC title
G06F9/3888
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

Patent family

Related publications grouped by family.

View patent family 81851825

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12056494B2 cover?: Apparatuses, systems, and techniques to identify instructions for advanced execution. In at least one embodiment, a processor performs one or more instructions that have been identified by a compiler to be speculatively performed in parallel.
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06F8/4441. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).