Apparatus and method for performing conversion operation
US-2016126975-A1 · May 5, 2016 · US
US12373911B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12373911-B2 |
| Application number | US-202318456235-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 25, 2023 |
| Priority date | Apr 28, 2017 |
| Publication date | Jul 29, 2025 |
| Grant date | Jul 29, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides a general-purpose graphics processing unit comprising a dynamic precision floating-point unit including a control unit having precision tracking hardware logic to track an available number of bits of precision for computed data relative to a target precision, wherein the dynamic precision floating-point unit includes computational logic to output data at multiple precisions.
Opening claim text (preview).
What is claimed is: 1. A graphics processor comprising: a memory device; a compressor to compress data to be written to the memory device; and a streaming multiprocessor coupled with the memory device, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to concurrently execute multiple threads, including a first thread in parallel with a second thread, wherein the first thread is configured to process a first instruction to cause a first portion of the streaming multiprocessor to perform a floating-point operation on multiple floating-point input operands, wherein the second thread is configured to process a second instruction to cause a second portion of the streaming multiprocessor to perform an integer operation on multiple integer operands, wherein the streaming multiprocessor is to perform operations for a third instruction, the streaming multiprocessor to perform a first operation of the third instruction on 16-bit floating-point input and a second operation of the third instruction on input that includes a 32-bit floating-point input, wherein the streaming multiprocessor is to perform operations for a fourth instruction, the streaming multiprocessor to perform a third operation on 8-bit integer input and a fourth operation on input that includes a 32-bit integer input, and wherein the first operation of the fourth instruction includes a multiply and the second operation of the fourth instruction includes an accumulate. 2. The graphics processor as in claim 1 , wherein the 16-bit floating-point input includes a half-precision floating-point input. 3. The graphics processor as in claim 1 , wherein the multiple floating-point input operands include 64-bit data elements. 4. The graphics processor as in claim 1 , wherein the first operation of the third instruction includes a multiply and the second operation of the third instruction includes an accumulate. 5. The graphics processor as in claim 1 , further comprising a level-2 (L2) cache coupled with the compressor. 6. The graphics processor as in claim 5 , wherein the compressor is to losslessly compress the data to be written to the memory device. 7. The graphics processor as in claim 5 , wherein the compressor is to decompress data to be read from the memory device. 8. The graphics processor as in claim 6 , wherein the memory device is a high-bandwidth memory (HBM) device. 9. A method comprising: decoding a first instruction via an instruction decoder of a graphics processor, the first instruction decoded into a first decoded instruction, wherein the graphics processor includes a streaming multiprocessor coupled to a memory device and a compressor to compress data to be written to the memory device, and the streaming multiprocessor includes a single instruction, multiple thread (SIMT); executing multiple threads associated with the first decoded instruction via the streaming multiprocessor, wherein the first decoded instruction causes a first portion of the streaming multiprocessor to perform a floating-point operation on multiple floating-point input operands; decoding a second instruction via the instruction decoder of the graphics processor into a second decoded instruction; executing multiple threads associated with the second decoded instruction via the streaming multiprocessor, wherein the second decoded instruction causes a second portion of the streaming multiprocessor to perform an integer operation on multiple integer operands, wherein the streaming multiprocessor is to execute a thread of the first decoded instruction in parallel with a thread of the second decoded instruction; decoding a third instruction via the instruction decoder of the graphics processor into a third decoded instruction; executing multiple threads associated with the third decoded instruction via the streaming multiprocessor, wherein the streaming multiprocessor performs a first operation of the third decoded instruction on 16 -bit floating-point input and a second operation of the third decoded instruction on input that includes a 32-bit floating-point input; decoding a fourth instruction via the instruction decoder of the graphics processor into a fourth decoded instruction; and executing multiple threads associated with the fourth decoded instruction via the streaming multiprocessor, wherein the streaming multiprocessor performs the first operation of the fourth decoded instruction on 8 -bit integer input and a second operation of the fourth decoded instruction on input that includes a 32-bit integer input, wherein the first operation of the fourth decoded instruction includes a multiply and the second operation of the fourth decoded instruction includes an accumulate. 10. The method as in claim 9 , wherein the third instruction is a floating-point instruction, the streaming multiprocessor performs the first operation of the third decoded instruction using a first number of bits associated with a first floating-point precision and the second operation of the third decoded instruction using a second number of bits associated with a second floating-point precision. 11. The method as in claim 9 , wherein the 16-bit floating-point input includes a half-precision floating-point input. 12. The method as in claim 9 , wherein the streaming multiprocessor performs the first operation of the fourth decoded instruction using a first number of bits associated with a first representable range of integer values and the second operation of the fourth decoded instruction using a second number of bits associated with a second representable range of integer values. 13. The method as in claim 12 , wherein the multiple floating-point input operands include 64-bit data elements. 14. The method as in claim 13 , wherein the first operation of the third decoded instruction includes a multiply and the second operation of the third decoded instruction includes an accumulate. 15. The method as in claim 9 , further comprising compressing data associated with the first instruction, second instruction, or third instruction before writing the data to the memory device. 16. The method as in claim 15 , further comprising losslessly compressing the data associated with the first instruction, second instruction, or third instruction before writing the data to the memory device. 17. The method as in claim 9 , further comprising decompressing data associated with the first instruction, second instruction, or third instruction after reading the data from the memory device. 18. A graphics processing system comprising: a system interface coupled with an interconnect fabric; a graphics memory device coupled with the interconnect fabric; a compressor to compress data to be written to the graphics memory device; and a streaming multiprocessor coupled with the graphics memory device, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to concurrently execute multiple threads, including a first thread in parallel with a second thread, wherein the first thread is configured to process a first instruction to cause a first portion of the streaming multiprocessor to perform a floating-point operation on multiple floating-point input operands, wherein the second thread is configured to process a second instruction to cause a second portion of the streaming multiprocessor to perform an integer operation on multiple integer operands, and wherein the streaming multiprocessor is
Learning methods · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
using an input/output type connection, e.g. channel, I/O port · CPC title
using a common memory, e.g. mailbox · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.