What technology area does this patent fall under?

Primary CPC classification G06T1/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

US11954063B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11954063-B2
Application number	US-202318170900-A
Country	US
Kind code	B2
Filing date	Feb 17, 2023
Priority date	Mar 15, 2019
Publication date	Apr 9, 2024
Grant date	Apr 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processing unit (GPU) comprising: a single instruction, multiple thread (SIMT) multiprocessor comprising: an instruction cache; a shared memory coupled with the instruction cache; circuitry coupled with the shared memory and the instruction cache, the circuitry including: multiple texture units; a first core including hardware to accelerate matrix operations; and a second core configured to: receive an instruction having multiple operands, wherein the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent; and process the instruction using the multiple operands, wherein to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition. 2. The GPU of claim 1 , wherein the SIMT multiprocessor is to execute a warp of threads in response to the instruction. 3. The GPU of claim 1 , wherein the SIMT multiprocessor is to perform a parallel matrix multiply operation via the first core, the parallel matrix multiply operation performed on input having the BF16 number format. 4. The GPU of claim 1 , further comprising a third core including hardware to accelerate ray tracing operations. 5. The GPU of claim 1 , further comprising texture processing circuitry external to and coupled with the SIMT multiprocessor. 6. The GPU of claim 1 , wherein the instruction is to cause the second core to perform a dot product operation. 7. The GPU of claim 1 , wherein each of the multiple operands include a packed data type, the packed data type including multiple data elements in the BF16 number format. 8. A method comprising: fetching an instruction from an instruction cache of a graphics processing unit (GPU), the instruction a single instruction multiple data (SIMD) instruction having multiple operands and configured to use a bfloat16 (BF16) number format, wherein the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent, the GPU includes a shared memory coupled with the instruction cache and circuitry coupled with the shared memory and the instruction cache; dispatching a warp of threads to a single instruction multiple thread (SIMT) multiprocessor of the GPU in response to the instruction, wherein the SIMT multiprocessor includes multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to execute a thread of the instruction; and processing the instruction on the second core using the multiple operands, wherein processing the instruction includes performing a multiply operation, performing an addition to a result of the multiply operation, and applying a rectified linear unit function to a result of the addition. 9. The method of claim 8 , further comprising performing, via the first core, a parallel matrix multiply operation on input having the BF16 number format. 10. The method of claim 8 , wherein the SIMT multiprocessor includes a third core to accelerate ray tracing operations and the method further comprises accelerating a ray tracing operation via the third core in parallel with processing the instruction. 11. The method of claim 8 , further comprising performing texture processing operations via texture processing circuitry that is external to and coupled with the SIMT multiprocessor. 12. The method of claim 8 , further comprising performing a dot product operation via the second core in response to the instruction. 13. The method of claim 8 , wherein each of the multiple operands include a packed data type, the packed data type including multiple data elements in the BF16 number format. 14. A graphics processing system comprising: a memory device; a graphics processor coupled with the memory device, the graphics processor comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including: multiple texture units; a first core including hardware to accelerate matrix operations; and a second core configured to receive an instruction having multiple operands including data elements A, B, C, and D and process the instruction to perform operations on the data elements, wherein the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format, the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent, and to process the instruction includes to perform an operation D=A*B+C and apply a rectified linear unit function to a result of the operation. 15. The graphics processing system of claim 14 , wherein the SIMT multiprocessor is to execute a warp of threads in response to the instruction. 16. The graphics processing system of claim 14 , wherein the SIMT multiprocessor is to perform a parallel matrix multiply operation via the first core, the parallel matrix multiply operation performed on input having the BF16 number format. 17. The graphics processing system of claim 14 , further comprising a third core including hardware to accelerate ray tracing operations. 18. The graphics processing system of claim 14 , further comprising texture processing circuitry external to and coupled with the SIMT multiprocessor. 19. The graphics processing system of claim 14 , wherein the instruction is to cause the second core to perform a dot product operation. 20. The graphics processing system of claim 14 , wherein each of the multiple operands include a packed data type, the packed data type including multiple data elements in the BF16 number format. 21. A parallel processing unit comprising: a first processing cluster to perform parallel processing operations, the first processing cluster including a ray tracing core to perform a ray tracing operation and a matrix processing core to perform a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, the second processing cluster including a processing core configured to process an instruction, wherein the instruction is a single instruction multiple data (SIMD) instruction having multiple operands and configured to use a bfloatl6 (BF16) number format, the processing core including a floating-point unit to perform floating point operations on input included in the multiple operands, the floating point unit including a multiplier to multiply second and third input while an accumulator adds a first input with output from the multiplier. 22. The parallel processing unit of claim 21 , wherein the first, second, and third input includes data in the BF16 number format. 23. The parallel processing unit of claim 21 , wherein the first input includes data in a single-precision floating point format and the second and third input include data in the BF16 number format.

Assignees

Intel Corp

Inventors

Classifications

G06F2212/652
Page size control · CPC title
G06F2212/608
Details relating to cache mapping · CPC title
G06F2212/6028
Prefetching based on hints or prefetch instructions · CPC title
G06F2212/6026
Prefetching based on access pattern detection, e.g. stride based prefetch · CPC title
G06F2212/601
Reconfiguration of cache memory · CPC title

Patent family

Related publications grouped by family.

View patent family 70277485

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11954063B2 cover?: Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple o…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).