Compute optimizations for low precision machine learning operations

US12373911B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12373911-B2
Application numberUS-202318456235-A
CountryUS
Kind codeB2
Filing dateAug 25, 2023
Priority dateApr 28, 2017
Publication dateJul 29, 2025
Grant dateJul 29, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a general-purpose graphics processing unit comprising a dynamic precision floating-point unit including a control unit having precision tracking hardware logic to track an available number of bits of precision for computed data relative to a target precision, wherein the dynamic precision floating-point unit includes computational logic to output data at multiple precisions.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processor comprising: a memory device; a compressor to compress data to be written to the memory device; and a streaming multiprocessor coupled with the memory device, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to concurrently execute multiple threads, including a first thread in parallel with a second thread, wherein the first thread is configured to process a first instruction to cause a first portion of the streaming multiprocessor to perform a floating-point operation on multiple floating-point input operands, wherein the second thread is configured to process a second instruction to cause a second portion of the streaming multiprocessor to perform an integer operation on multiple integer operands, wherein the streaming multiprocessor is to perform operations for a third instruction, the streaming multiprocessor to perform a first operation of the third instruction on 16-bit floating-point input and a second operation of the third instruction on input that includes a 32-bit floating-point input, wherein the streaming multiprocessor is to perform operations for a fourth instruction, the streaming multiprocessor to perform a third operation on 8-bit integer input and a fourth operation on input that includes a 32-bit integer input, and wherein the first operation of the fourth instruction includes a multiply and the second operation of the fourth instruction includes an accumulate. 2. The graphics processor as in claim 1 , wherein the 16-bit floating-point input includes a half-precision floating-point input. 3. The graphics processor as in claim 1 , wherein the multiple floating-point input operands include 64-bit data elements. 4. The graphics processor as in claim 1 , wherein the first operation of the third instruction includes a multiply and the second operation of the third instruction includes an accumulate. 5. The graphics processor as in claim 1 , further comprising a level-2 (L2) cache coupled with the compressor. 6. The graphics processor as in claim 5 , wherein the compressor is to losslessly compress the data to be written to the memory device. 7. The graphics processor as in claim 5 , wherein the compressor is to decompress data to be read from the memory device. 8. The graphics processor as in claim 6 , wherein the memory device is a high-bandwidth memory (HBM) device. 9. A method comprising: decoding a first instruction via an instruction decoder of a graphics processor, the first instruction decoded into a first decoded instruction, wherein the graphics processor includes a streaming multiprocessor coupled to a memory device and a compressor to compress data to be written to the memory device, and the streaming multiprocessor includes a single instruction, multiple thread (SIMT); executing multiple threads associated with the first decoded instruction via the streaming multiprocessor, wherein the first decoded instruction causes a first portion of the streaming multiprocessor to perform a floating-point operation on multiple floating-point input operands; decoding a second instruction via the instruction decoder of the graphics processor into a second decoded instruction; executing multiple threads associated with the second decoded instruction via the streaming multiprocessor, wherein the second decoded instruction causes a second portion of the streaming multiprocessor to perform an integer operation on multiple integer operands, wherein the streaming multiprocessor is to execute a thread of the first decoded instruction in parallel with a thread of the second decoded instruction; decoding a third instruction via the instruction decoder of the graphics processor into a third decoded instruction; executing multiple threads associated with the third decoded instruction via the streaming multiprocessor, wherein the streaming multiprocessor performs a first operation of the third decoded instruction on 16 -bit floating-point input and a second operation of the third decoded instruction on input that includes a 32-bit floating-point input; decoding a fourth instruction via the instruction decoder of the graphics processor into a fourth decoded instruction; and executing multiple threads associated with the fourth decoded instruction via the streaming multiprocessor, wherein the streaming multiprocessor performs the first operation of the fourth decoded instruction on 8 -bit integer input and a second operation of the fourth decoded instruction on input that includes a 32-bit integer input, wherein the first operation of the fourth decoded instruction includes a multiply and the second operation of the fourth decoded instruction includes an accumulate. 10. The method as in claim 9 , wherein the third instruction is a floating-point instruction, the streaming multiprocessor performs the first operation of the third decoded instruction using a first number of bits associated with a first floating-point precision and the second operation of the third decoded instruction using a second number of bits associated with a second floating-point precision. 11. The method as in claim 9 , wherein the 16-bit floating-point input includes a half-precision floating-point input. 12. The method as in claim 9 , wherein the streaming multiprocessor performs the first operation of the fourth decoded instruction using a first number of bits associated with a first representable range of integer values and the second operation of the fourth decoded instruction using a second number of bits associated with a second representable range of integer values. 13. The method as in claim 12 , wherein the multiple floating-point input operands include 64-bit data elements. 14. The method as in claim 13 , wherein the first operation of the third decoded instruction includes a multiply and the second operation of the third decoded instruction includes an accumulate. 15. The method as in claim 9 , further comprising compressing data associated with the first instruction, second instruction, or third instruction before writing the data to the memory device. 16. The method as in claim 15 , further comprising losslessly compressing the data associated with the first instruction, second instruction, or third instruction before writing the data to the memory device. 17. The method as in claim 9 , further comprising decompressing data associated with the first instruction, second instruction, or third instruction after reading the data from the memory device. 18. A graphics processing system comprising: a system interface coupled with an interconnect fabric; a graphics memory device coupled with the interconnect fabric; a compressor to compress data to be written to the graphics memory device; and a streaming multiprocessor coupled with the graphics memory device, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to concurrently execute multiple threads, including a first thread in parallel with a second thread, wherein the first thread is configured to process a first instruction to cause a first portion of the streaming multiprocessor to perform a floating-point operation on multiple floating-point input operands, wherein the second thread is configured to process a second instruction to cause a second portion of the streaming multiprocessor to perform an integer operation on multiple integer operands, and wherein the streaming multiprocessor is

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • using an input/output type connection, e.g. channel, I/O port · CPC title

  • using a common memory, e.g. mailbox · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12373911B2 cover?
One embodiment provides a general-purpose graphics processing unit comprising a dynamic precision floating-point unit including a control unit having precision tracking hardware logic to track an available number of bits of precision for computed data relative to a target precision, wherein the dynamic precision floating-point unit includes computational logic to output data at multiple precisi…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).