What technology area does this patent fall under?

Primary CPC classification G06T1/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 09 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Mixed inference using low and high precision

US11409537B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11409537-B2
Application number	US-201715819167-A
Country	US
Kind code	B2
Filing date	Nov 21, 2017
Priority date	Apr 24, 2017
Publication date	Aug 9, 2022
Grant date	Aug 9, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread (SIMT) architecture, the general-purpose graphics compute unit to simultaneously execute the first instruction and the second instruction, wherein the integer operation corresponds to a memory address calculation.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising: a discrete graphics processor circuit including multiple general-purpose graphics compute units, the discrete graphics processor circuit including: an instruction cache to store a first instruction and a second instruction, wherein the first instruction and the second instruction are single instructions, the first instruction includes four operands including two 16-bit floating-point source operands and a 32-bit floating-point source operand, the first instruction is to cause the GPU to perform a multi-dimensional mixed precision floating-point operation in response to the first instruction, the second instruction includes at least one integer operand, the second instruction is to cause the GPU to perform an integer operation in response to the second instruction, and the integer operation corresponds to an address calculation; and a plurality of general-purpose graphics compute units having a single instruction, multiple thread (SIMT) architecture, the plurality of general-purpose graphics compute units each including a first functional unit and a second functional unit, the first functional unit to execute a plurality of threads of the first instruction and the second functional unit is a compute unit configured to execute a plurality of threads of the second instruction during execution of the plurality of threads of the first instruction by the first functional unit, wherein at least one general-purpose graphics compute unit is to dynamically configure a precision for the first functional unit to execute the multi-dimensional mixed precision floating-point operation for the first instruction. 2. The GPU as in claim 1 , wherein the multi-dimensional mixed precision floating-point operation is a two-dimensional matrix multiply operation and the multi-dimensional mixed precision floating-point operation is associated with a dot product operation. 3. The GPU as in claim 2 , wherein the at least one integer operand is a pointer to a memory location. 4. The GPU as in claim 3 , wherein to execute the multi-dimensional mixed precision floating-point operation includes to perform a multiply operation on the two 16-bit floating-point source operands and perform an add operation on a product of the multiply operation and the 32-bit floating-point source operand. 5. The GPU as in claim 1 , additionally including a scheduler to schedule at least one thread of the first instruction and at least one thread of the second instruction to the at least one general-purpose graphics compute unit, wherein the at least one general-purpose graphics compute unit is to dynamically enable the second functional unit from an idle state to execute the at least one thread of the second instruction based on a computational requirement of a workload associated with the first instruction or the second instruction. 6. The GPU as in claim 5 , the scheduler to independently schedule multiple threads of each of the first instruction and the second instruction. 7. The GPU as in claim 6 , wherein threads of the first instruction and the second instruction have independent thread state. 8. A data processing system comprising: an add-in card coupled with a system interface of the data processing system, the add-in card including: a discrete graphics processing unit (GPU) to accelerate machine learning operations, the GPU including an instruction cache to store a first instruction and a second instruction, wherein the first instruction and the second instruction are single instructions, the first instruction includes four operands including two 16-bit floating-point source operands and a 32-bit floating-point source operand, the first instruction is to cause the GPU to perform a multi-dimensional mixed precision floating-point operation in response to the first instruction, the second instruction includes at least one integer operand, the second instruction is to cause the GPU to perform an integer operation in response to the second instruction, and the integer operation corresponds to an address calculation; a plurality of general-purpose graphics compute units included within the GPU, the plurality of general-purpose graphics compute units having a single instruction, multiple thread (SIMT) architecture, the plurality of general-purpose graphics compute units each including a first functional unit and a second functional unit, the first functional unit to execute a plurality of threads of the first instruction and the second functional unit is a compute unit configured to execute a plurality of threads of the second instruction concurrently with the execution of the plurality of threads of the first instruction by the first functional unit, wherein at least one general-purpose graphics compute unit is to dynamically configure a precision for the first functional unit to execute the multi-dimensional mixed precision floating-point operation for the thread of the first instruction; and a memory communicatively coupled with the graphics processing unit. 9. The data processing system as in claim 8 , wherein the multi-dimensional mixed precision floating-point operation is a two-dimensional matrix multiply operation, the first instruction is a single instruction, GPU is to perform the multi-dimensional mixed precision floating-point operation in response to the single instruction, and the multi-dimensional mixed precision floating-point operation is associated with a dot product operation. 10. The data processing system as in claim 9 , wherein the at least one integer operand is a pointer to a memory location. 11. The data processing system as in claim 10 , wherein to execute the multi-dimensional mixed precision floating-point operation includes to perform a multiply operation on the two 16-bit floating-point source operands and perform an add operation on a product of the multiply operation and the 32-bit floating-point source operand. 12. The data processing system as in claim 8 , the GPU additionally including a scheduler to schedule at least one thread of the first instruction and at least one thread of the second instruction to the at least one general-purpose graphics compute unit, wherein the at least one general-purpose graphics compute unit is to dynamically enable the second functional unit from an idle state to execute the at least one thread of the second instruction based on a computational requirement of a workload associated with the first instruction or the second instruction. 13. The data processing system as in claim 12 , the scheduler to independently schedule multiple threads of each of the first instruction and the second instruction. 14. The data processing system as in claim 13 , wherein threads of the first instruction and the second instruction have independent thread state. 15. A method of accelerating a machine-learning operation, the method comprising: decoding a first instruction and a second instruction on a graphics processing unit (GPU), the GPU including a discrete graphics processor circuit having a single instruction, multiple thread (SIMT) architecture and a plurality of SIMT multiprocessors, wherein the first instruction and the second instruction are single instructions, the first instruction includes four operands including two 16-bit floating-point source operands and a 32-bit floating-point source operand and the second instruction includes at least one integer operand; and simultaneously executing a thread of the first instruction and a thread of the second instruction on a first multiprocessor of the plurali

Assignees

Intel Corp

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 61655684

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11409537B2 cover?: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an intege…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 09 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).