Simulation Processor with Backside Look-Up Table
US-2017323042-A1 · Nov 9, 2017 · US
US11409537B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11409537-B2 |
| Application number | US-201715819167-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 21, 2017 |
| Priority date | Apr 24, 2017 |
| Publication date | Aug 9, 2022 |
| Grant date | Aug 9, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread (SIMT) architecture, the general-purpose graphics compute unit to simultaneously execute the first instruction and the second instruction, wherein the integer operation corresponds to a memory address calculation.
Opening claim text (preview).
What is claimed is: 1. A graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising: a discrete graphics processor circuit including multiple general-purpose graphics compute units, the discrete graphics processor circuit including: an instruction cache to store a first instruction and a second instruction, wherein the first instruction and the second instruction are single instructions, the first instruction includes four operands including two 16-bit floating-point source operands and a 32-bit floating-point source operand, the first instruction is to cause the GPU to perform a multi-dimensional mixed precision floating-point operation in response to the first instruction, the second instruction includes at least one integer operand, the second instruction is to cause the GPU to perform an integer operation in response to the second instruction, and the integer operation corresponds to an address calculation; and a plurality of general-purpose graphics compute units having a single instruction, multiple thread (SIMT) architecture, the plurality of general-purpose graphics compute units each including a first functional unit and a second functional unit, the first functional unit to execute a plurality of threads of the first instruction and the second functional unit is a compute unit configured to execute a plurality of threads of the second instruction during execution of the plurality of threads of the first instruction by the first functional unit, wherein at least one general-purpose graphics compute unit is to dynamically configure a precision for the first functional unit to execute the multi-dimensional mixed precision floating-point operation for the first instruction. 2. The GPU as in claim 1 , wherein the multi-dimensional mixed precision floating-point operation is a two-dimensional matrix multiply operation and the multi-dimensional mixed precision floating-point operation is associated with a dot product operation. 3. The GPU as in claim 2 , wherein the at least one integer operand is a pointer to a memory location. 4. The GPU as in claim 3 , wherein to execute the multi-dimensional mixed precision floating-point operation includes to perform a multiply operation on the two 16-bit floating-point source operands and perform an add operation on a product of the multiply operation and the 32-bit floating-point source operand. 5. The GPU as in claim 1 , additionally including a scheduler to schedule at least one thread of the first instruction and at least one thread of the second instruction to the at least one general-purpose graphics compute unit, wherein the at least one general-purpose graphics compute unit is to dynamically enable the second functional unit from an idle state to execute the at least one thread of the second instruction based on a computational requirement of a workload associated with the first instruction or the second instruction. 6. The GPU as in claim 5 , the scheduler to independently schedule multiple threads of each of the first instruction and the second instruction. 7. The GPU as in claim 6 , wherein threads of the first instruction and the second instruction have independent thread state. 8. A data processing system comprising: an add-in card coupled with a system interface of the data processing system, the add-in card including: a discrete graphics processing unit (GPU) to accelerate machine learning operations, the GPU including an instruction cache to store a first instruction and a second instruction, wherein the first instruction and the second instruction are single instructions, the first instruction includes four operands including two 16-bit floating-point source operands and a 32-bit floating-point source operand, the first instruction is to cause the GPU to perform a multi-dimensional mixed precision floating-point operation in response to the first instruction, the second instruction includes at least one integer operand, the second instruction is to cause the GPU to perform an integer operation in response to the second instruction, and the integer operation corresponds to an address calculation; a plurality of general-purpose graphics compute units included within the GPU, the plurality of general-purpose graphics compute units having a single instruction, multiple thread (SIMT) architecture, the plurality of general-purpose graphics compute units each including a first functional unit and a second functional unit, the first functional unit to execute a plurality of threads of the first instruction and the second functional unit is a compute unit configured to execute a plurality of threads of the second instruction concurrently with the execution of the plurality of threads of the first instruction by the first functional unit, wherein at least one general-purpose graphics compute unit is to dynamically configure a precision for the first functional unit to execute the multi-dimensional mixed precision floating-point operation for the thread of the first instruction; and a memory communicatively coupled with the graphics processing unit. 9. The data processing system as in claim 8 , wherein the multi-dimensional mixed precision floating-point operation is a two-dimensional matrix multiply operation, the first instruction is a single instruction, GPU is to perform the multi-dimensional mixed precision floating-point operation in response to the single instruction, and the multi-dimensional mixed precision floating-point operation is associated with a dot product operation. 10. The data processing system as in claim 9 , wherein the at least one integer operand is a pointer to a memory location. 11. The data processing system as in claim 10 , wherein to execute the multi-dimensional mixed precision floating-point operation includes to perform a multiply operation on the two 16-bit floating-point source operands and perform an add operation on a product of the multiply operation and the 32-bit floating-point source operand. 12. The data processing system as in claim 8 , the GPU additionally including a scheduler to schedule at least one thread of the first instruction and at least one thread of the second instruction to the at least one general-purpose graphics compute unit, wherein the at least one general-purpose graphics compute unit is to dynamically enable the second functional unit from an idle state to execute the at least one thread of the second instruction based on a computational requirement of a workload associated with the first instruction or the second instruction. 13. The data processing system as in claim 12 , the scheduler to independently schedule multiple threads of each of the first instruction and the second instruction. 14. The data processing system as in claim 13 , wherein threads of the first instruction and the second instruction have independent thread state. 15. A method of accelerating a machine-learning operation, the method comprising: decoding a first instruction and a second instruction on a graphics processing unit (GPU), the GPU including a discrete graphics processor circuit having a single instruction, multiple thread (SIMT) architecture and a plurality of SIMT multiprocessors, wherein the first instruction and the second instruction are single instructions, the first instruction includes four operands including two 16-bit floating-point source operands and a 32-bit floating-point source operand and the second instruction includes at least one integer operand; and simultaneously executing a thread of the first instruction and a thread of the second instruction on a first multiprocessor of the plurali
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.