Fused Multiply-Add (FMA) low functional unit
US-2017185379-A1 · Jun 29, 2017 · US
US11461107B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11461107-B2 |
| Application number | US-201816227645-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 20, 2018 |
| Priority date | Apr 24, 2017 |
| Publication date | Oct 4, 2022 |
| Grant date | Oct 4, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides for a general-purpose graphics processing unit comprising a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The streaming multiprocessor comprises multiple processing blocks including multiple processing cores. The processing cores include independent integer and floating-point data paths that are configurable to concurrently execute multiple independent instructions. A memory is coupled with the multiple processing blocks.
Opening claim text (preview).
What is claimed is: 1. A general-purpose graphics processing unit comprising: a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading, wherein the streaming multiprocessor comprises: a first processing block including a first processing core having a first floating-point data path and a second processing core having a first integer data path, the first integer data path independent of the first floating-point data path and the first processing block additionally including a first register file coupled with the first processing core and the second processing core, wherein the first integer data path is to enable execution of a first instruction and the first floating-point data path is to enable execution of a second instruction, the first instruction to be executed concurrently with the second instruction; a second processing block including a third processing core having a second floating-point data path and a fourth processing core having a second integer data path, the second integer data path independent of the second floating-point data path and the second processing block additionally including a second register file coupled with the third processing core and the fourth processing core, wherein the second integer data path is to enable execution of a third instruction and the second floating-point data path is to enable execution of a fourth instruction, the third instruction to be executed concurrently with the fourth instruction, and the first register file different from the second register file; and a memory coupled with the first processing block and the second processing block. 2. The general-purpose graphics processing unit as in claim 1 , wherein the memory is shared between the first processing block and the second processing block. 3. The general-purpose graphics processing unit as in claim 2 , wherein the memory includes a data cache accessible by at least the first processing core and the third processing core. 4. The general-purpose graphics processing unit as in claim 1 , the streaming multiprocessor additionally comprising one or more hardware schedulers to schedule a first group of threads to at least the first processing core and the third processing core. 5. The general-purpose graphics processing unit as in claim 4 , the one or more hardware schedulers additionally to schedule a second group of threads to at least the second processing core and the fourth processing core. 6. The general-purpose graphics processing unit as in claim 5 , wherein: the first group of threads is associated with the first instruction; and the second group of threads is associated with the second instruction. 7. The general-purpose graphics processing unit as in claim 1 , wherein the first processing core is to perform a 64-bit floating-point operation in response to the second instruction. 8. The general-purpose graphics processing unit as in claim 1 , wherein the second processing core is to perform a 32-bit integer operation and the fourth processing core is to perform one or more 8-bit integer operations. 9. A data processing system comprising: a graphics double data rate memory; and a general-purpose graphics processor coupled with the graphics double data rate memory via two or more memory controllers, the general-purpose graphics processor comprising a hardware multithreading compute unit having a single instruction, multiple thread (SIMT) architecture, wherein the hardware multithreading compute unit comprises: a first register file and a first hardware scheduler, the first register file and the first hardware scheduler each coupled with a first processing core and a second processing core, the first processing core having a first floating-point data path and the second processing core having a first integer data path, the first integer data path independent of the first floating-point data path, wherein the first integer data path is to enable execution of a first instruction and the first floating-point data path is to enable execution of a second instruction, the first instruction to be executed concurrently with the second instruction; a second register file different from the first register file and a second hardware scheduler different from the first hardware scheduler, the second register file and the second hardware scheduler each coupled with a third processing core and a fourth processing core, the third processing core having a second floating-point data path and the fourth processing core having a second integer data path, the second integer data path independent of the second floating-point data path, wherein the second integer data path is to enable execution of a third instruction and the second floating-point data path is to enable execution of a fourth instruction, the third instruction to be executed concurrently with the fourth instruction; and an internal memory coupled with the first processing core, the second processing core, the third processing core, and the fourth processing core. 10. The data processing system as in claim 9 , wherein the internal memory is to be shared between the first processing core, the second processing core, the third processing core, and the fourth processing core and the graphics double data rate memory is graphics double data rate six (GDDR6) memory. 11. The data processing system as in claim 10 , wherein the internal memory includes a data cache accessible by the first processing core, the second processing core, the third processing core, and the fourth processing core. 12. The data processing system as in claim 9 , wherein the first hardware scheduler is to schedule threads of a first thread group to the first processing core and threads of a second thread group to the second processing core, wherein the second hardware scheduler is to schedule threads of the first thread group to the third processing core and threads of the second thread group to the fourth processing core, and wherein execution context for threads within the first thread group and the second thread group is to be maintained on-chip during execution. 13. The data processing system as in claim 12 , wherein the first processing core includes a first functional unit associated with the first floating-point data path and the second processing core includes a second functional unit associated with the first integer data path. 14. The data processing system as in claim 13 , the first functional unit to perform a floating-point operation and the second functional unit to perform an integer operation independently of the first functional unit. 15. The data processing system as in claim 14 , wherein the first functional unit is to perform a first floating-point operation on two 16-bit source operands and a second floating point operation on 32-bit floating-point operand. 16. The data processing system as in claim 14 , wherein the second functional unit is configurable to perform one or more of an 8-bit, 16-bit, and a 32-bit integer operation. 17. The data processing system as in claim 14 , wherein the third processing core includes a third functional unit to perform a 64-bit floating-point operation. 18. A method executed on a general-purpose graphics processing unit, the method comprising: scheduling a first group of threads to a first processing core via a first hardware scheduler and a third processing core via a second hardware scheduler, the first group of threads to perform a set of floating-point operations including a fused multiply-add operation, the first pro
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.