What technology area does this patent fall under?

Primary CPC classification G06F17/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Application programming interface to wait on matrix multiply-accumulate

US12204897B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12204897-B2
Application number	US-202218072081-A
Country	US
Kind code	B2
Filing date	Nov 30, 2022
Priority date	Nov 21, 2022
Publication date	Jan 21, 2025
Grant date	Jan 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: one or more circuits to perform an instruction to cause one or more instructions to wait to perform one or more matrix multiply-accumulate (MMA) operations in parallel based, at least in part, on a number of the MMA operations waiting to be performed. 2. The processor of claim 1 , wherein the instruction is to cause one or more threads comprising the one or more instructions to wait until one or more one or more groups of the one or more MMA operations have been performed. 3. The processor of claim 1 , wherein the instruction is to cause one or more threads comprising the one or more instructions to perform one or more other instructions and, in response to the instruction, wait until one or more one or more groups of the one or more MMA operations have been performed. 4. The processor of claim 1 , wherein the instruction is a wait instruction and one or more portions of the one or more MMA operations to be performed in parallel are one or more groups of asynchronous MMA operations to be performed. 5. The processor of claim 1 , wherein one or more portions of the one or more MMA operations to be performed in parallel are to be indicated, at least in part, as a parameter to the instruction. 6. The processor of claim 1 , wherein the one or more MMA operations have been performed if one or more results of said one or more MMA operations is stored in memory. 7. The processor of claim 1 , wherein a constant integer data value is to be indicated to the instruction to determine one or more portions of the one or more MMA operations to be performed in parallel. 8. The processor of claim 1 , wherein the processor is a graphics processing unit (GPU). 9. A system comprising: one or more processors to perform an instruction to cause one or more instructions to wait to perform one or more matrix multiply-accumulate (MMA) operations in parallel based, at least in part, on a number of the MMA operations waiting to be performed. 10. The system of claim 9 , wherein the instruction is to cause one or more threads comprising the one or more instructions to wait until a threshold quantity of groupings of the MMA operations are pending. 11. The system of claim 9 , wherein the instruction is to cause one or more threads comprising the one or more instructions to wait until one or more one or more groups of the one or more MMA operations have been performed. 12. The system of claim 9 , wherein one or more portions of the one or more MMA operations to be performed in parallel are to be indicated, at least in part, as a parameter to the instruction. 13. The system of claim 9 , wherein the instruction is to cause one or more threads comprising the one or more instructions to perform one or more other instructions and, in response to the instruction, wait until one or more one or more groups of the one or more MMA operations have been performed. 14. The system of claim 9 , wherein the one or more processors are graphics processing units (GPUs). 15. A method comprising: performing an instruction to cause one or more instructions to wait to perform one or more matrix multiply-accumulate (MMA) operations in parallel based, at least in part, on a number of the MMA operations waiting to be performed. 16. The method of claim 15 , further comprising causing, in response to the instruction, one or more threads comprising the one or more instructions to wait until a threshold quantity of groupings of the one or more MMA operations are pending. 17. The method of claim 15 , further comprising causing, in response to the instruction, one or more threads comprising the one or more instructions to wait until a threshold quantity of groupings of the one or more MMA operations have been performed. 18. The method of claim 15 , further comprising causing, in response to the instruction, one or more threads comprising the one or more instructions to perform one or more other instructions and, in response to the instruction, to wait until one or more portions of the one or more MMA operations to be performed in parallel have been performed. 19. The method of claim 15 , wherein the one or more MMA operations are to be asynchronously performed by one or more accelerators of one or more graphics processing units (GPUs). 20. The method of claim 15 , further comprising receiving, as a parameter to the instruction, a threshold value usable to identify one or more portions of the one or more MMA operations to be performed in parallel.

Assignees

Nvidia Corp

Classifications

G06F17/16Primary
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
G06F9/3834
Maintaining memory consistency · CPC title
G06F9/30087
Synchronisation or serialisation instructions · CPC title
G06F9/3009
Thread control instructions · CPC title
G06F9/3001Primary
Arithmetic instructions · CPC title

Patent family

Related publications grouped by family.

View patent family 91079864

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12204897B2 cover?: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).