Mixed inference using low and high precision

US11409537B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11409537-B2
Application numberUS-201715819167-A
CountryUS
Kind codeB2
Filing dateNov 21, 2017
Priority dateApr 24, 2017
Publication dateAug 9, 2022
Grant dateAug 9, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread (SIMT) architecture, the general-purpose graphics compute unit to simultaneously execute the first instruction and the second instruction, wherein the integer operation corresponds to a memory address calculation.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising: a discrete graphics processor circuit including multiple general-purpose graphics compute units, the discrete graphics processor circuit including: an instruction cache to store a first instruction and a second instruction, wherein the first instruction and the second instruction are single instructions, the first instruction includes four operands including two 16-bit floating-point source operands and a 32-bit floating-point source operand, the first instruction is to cause the GPU to perform a multi-dimensional mixed precision floating-point operation in response to the first instruction, the second instruction includes at least one integer operand, the second instruction is to cause the GPU to perform an integer operation in response to the second instruction, and the integer operation corresponds to an address calculation; and a plurality of general-purpose graphics compute units having a single instruction, multiple thread (SIMT) architecture, the plurality of general-purpose graphics compute units each including a first functional unit and a second functional unit, the first functional unit to execute a plurality of threads of the first instruction and the second functional unit is a compute unit configured to execute a plurality of threads of the second instruction during execution of the plurality of threads of the first instruction by the first functional unit, wherein at least one general-purpose graphics compute unit is to dynamically configure a precision for the first functional unit to execute the multi-dimensional mixed precision floating-point operation for the first instruction. 2. The GPU as in claim 1 , wherein the multi-dimensional mixed precision floating-point operation is a two-dimensional matrix multiply operation and the multi-dimensional mixed precision floating-point operation is associated with a dot product operation. 3. The GPU as in claim 2 , wherein the at least one integer operand is a pointer to a memory location. 4. The GPU as in claim 3 , wherein to execute the multi-dimensional mixed precision floating-point operation includes to perform a multiply operation on the two 16-bit floating-point source operands and perform an add operation on a product of the multiply operation and the 32-bit floating-point source operand. 5. The GPU as in claim 1 , additionally including a scheduler to schedule at least one thread of the first instruction and at least one thread of the second instruction to the at least one general-purpose graphics compute unit, wherein the at least one general-purpose graphics compute unit is to dynamically enable the second functional unit from an idle state to execute the at least one thread of the second instruction based on a computational requirement of a workload associated with the first instruction or the second instruction. 6. The GPU as in claim 5 , the scheduler to independently schedule multiple threads of each of the first instruction and the second instruction. 7. The GPU as in claim 6 , wherein threads of the first instruction and the second instruction have independent thread state. 8. A data processing system comprising: an add-in card coupled with a system interface of the data processing system, the add-in card including: a discrete graphics processing unit (GPU) to accelerate machine learning operations, the GPU including an instruction cache to store a first instruction and a second instruction, wherein the first instruction and the second instruction are single instructions, the first instruction includes four operands including two 16-bit floating-point source operands and a 32-bit floating-point source operand, the first instruction is to cause the GPU to perform a multi-dimensional mixed precision floating-point operation in response to the first instruction, the second instruction includes at least one integer operand, the second instruction is to cause the GPU to perform an integer operation in response to the second instruction, and the integer operation corresponds to an address calculation; a plurality of general-purpose graphics compute units included within the GPU, the plurality of general-purpose graphics compute units having a single instruction, multiple thread (SIMT) architecture, the plurality of general-purpose graphics compute units each including a first functional unit and a second functional unit, the first functional unit to execute a plurality of threads of the first instruction and the second functional unit is a compute unit configured to execute a plurality of threads of the second instruction concurrently with the execution of the plurality of threads of the first instruction by the first functional unit, wherein at least one general-purpose graphics compute unit is to dynamically configure a precision for the first functional unit to execute the multi-dimensional mixed precision floating-point operation for the thread of the first instruction; and a memory communicatively coupled with the graphics processing unit. 9. The data processing system as in claim 8 , wherein the multi-dimensional mixed precision floating-point operation is a two-dimensional matrix multiply operation, the first instruction is a single instruction, GPU is to perform the multi-dimensional mixed precision floating-point operation in response to the single instruction, and the multi-dimensional mixed precision floating-point operation is associated with a dot product operation. 10. The data processing system as in claim 9 , wherein the at least one integer operand is a pointer to a memory location. 11. The data processing system as in claim 10 , wherein to execute the multi-dimensional mixed precision floating-point operation includes to perform a multiply operation on the two 16-bit floating-point source operands and perform an add operation on a product of the multiply operation and the 32-bit floating-point source operand. 12. The data processing system as in claim 8 , the GPU additionally including a scheduler to schedule at least one thread of the first instruction and at least one thread of the second instruction to the at least one general-purpose graphics compute unit, wherein the at least one general-purpose graphics compute unit is to dynamically enable the second functional unit from an idle state to execute the at least one thread of the second instruction based on a computational requirement of a workload associated with the first instruction or the second instruction. 13. The data processing system as in claim 12 , the scheduler to independently schedule multiple threads of each of the first instruction and the second instruction. 14. The data processing system as in claim 13 , wherein threads of the first instruction and the second instruction have independent thread state. 15. A method of accelerating a machine-learning operation, the method comprising: decoding a first instruction and a second instruction on a graphics processing unit (GPU), the GPU including a discrete graphics processor circuit having a single instruction, multiple thread (SIMT) architecture and a plurality of SIMT multiprocessors, wherein the first instruction and the second instruction are single instructions, the first instruction includes four operands including two 16-bit floating-point source operands and a 32-bit floating-point source operand and the second instruction includes at least one integer operand; and simultaneously executing a thread of the first instruction and a thread of the second instruction on a first multiprocessor of the plurali

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11409537B2 cover?
One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an intege…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 09 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).