Apparatus and method for determining a sector division ratio of a shared cache memory
US-2015339229-A1 · Nov 26, 2015 · US
US11954063B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11954063-B2 |
| Application number | US-202318170900-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 17, 2023 |
| Priority date | Mar 15, 2019 |
| Publication date | Apr 9, 2024 |
| Grant date | Apr 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.
Opening claim text (preview).
What is claimed is: 1. A graphics processing unit (GPU) comprising: a single instruction, multiple thread (SIMT) multiprocessor comprising: an instruction cache; a shared memory coupled with the instruction cache; circuitry coupled with the shared memory and the instruction cache, the circuitry including: multiple texture units; a first core including hardware to accelerate matrix operations; and a second core configured to: receive an instruction having multiple operands, wherein the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent; and process the instruction using the multiple operands, wherein to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition. 2. The GPU of claim 1 , wherein the SIMT multiprocessor is to execute a warp of threads in response to the instruction. 3. The GPU of claim 1 , wherein the SIMT multiprocessor is to perform a parallel matrix multiply operation via the first core, the parallel matrix multiply operation performed on input having the BF16 number format. 4. The GPU of claim 1 , further comprising a third core including hardware to accelerate ray tracing operations. 5. The GPU of claim 1 , further comprising texture processing circuitry external to and coupled with the SIMT multiprocessor. 6. The GPU of claim 1 , wherein the instruction is to cause the second core to perform a dot product operation. 7. The GPU of claim 1 , wherein each of the multiple operands include a packed data type, the packed data type including multiple data elements in the BF16 number format. 8. A method comprising: fetching an instruction from an instruction cache of a graphics processing unit (GPU), the instruction a single instruction multiple data (SIMD) instruction having multiple operands and configured to use a bfloat16 (BF16) number format, wherein the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent, the GPU includes a shared memory coupled with the instruction cache and circuitry coupled with the shared memory and the instruction cache; dispatching a warp of threads to a single instruction multiple thread (SIMT) multiprocessor of the GPU in response to the instruction, wherein the SIMT multiprocessor includes multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to execute a thread of the instruction; and processing the instruction on the second core using the multiple operands, wherein processing the instruction includes performing a multiply operation, performing an addition to a result of the multiply operation, and applying a rectified linear unit function to a result of the addition. 9. The method of claim 8 , further comprising performing, via the first core, a parallel matrix multiply operation on input having the BF16 number format. 10. The method of claim 8 , wherein the SIMT multiprocessor includes a third core to accelerate ray tracing operations and the method further comprises accelerating a ray tracing operation via the third core in parallel with processing the instruction. 11. The method of claim 8 , further comprising performing texture processing operations via texture processing circuitry that is external to and coupled with the SIMT multiprocessor. 12. The method of claim 8 , further comprising performing a dot product operation via the second core in response to the instruction. 13. The method of claim 8 , wherein each of the multiple operands include a packed data type, the packed data type including multiple data elements in the BF16 number format. 14. A graphics processing system comprising: a memory device; a graphics processor coupled with the memory device, the graphics processor comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including: multiple texture units; a first core including hardware to accelerate matrix operations; and a second core configured to receive an instruction having multiple operands including data elements A, B, C, and D and process the instruction to perform operations on the data elements, wherein the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format, the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent, and to process the instruction includes to perform an operation D=A*B+C and apply a rectified linear unit function to a result of the operation. 15. The graphics processing system of claim 14 , wherein the SIMT multiprocessor is to execute a warp of threads in response to the instruction. 16. The graphics processing system of claim 14 , wherein the SIMT multiprocessor is to perform a parallel matrix multiply operation via the first core, the parallel matrix multiply operation performed on input having the BF16 number format. 17. The graphics processing system of claim 14 , further comprising a third core including hardware to accelerate ray tracing operations. 18. The graphics processing system of claim 14 , further comprising texture processing circuitry external to and coupled with the SIMT multiprocessor. 19. The graphics processing system of claim 14 , wherein the instruction is to cause the second core to perform a dot product operation. 20. The graphics processing system of claim 14 , wherein each of the multiple operands include a packed data type, the packed data type including multiple data elements in the BF16 number format. 21. A parallel processing unit comprising: a first processing cluster to perform parallel processing operations, the first processing cluster including a ray tracing core to perform a ray tracing operation and a matrix processing core to perform a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, the second processing cluster including a processing core configured to process an instruction, wherein the instruction is a single instruction multiple data (SIMD) instruction having multiple operands and configured to use a bfloatl6 (BF16) number format, the processing core including a floating-point unit to perform floating point operations on input included in the multiple operands, the floating point unit including a multiplier to multiply second and third input while an accumulator adds a first input with output from the multiplier. 22. The parallel processing unit of claim 21 , wherein the first, second, and third input includes data in the BF16 number format. 23. The parallel processing unit of claim 21 , wherein the first input includes data in a single-precision floating point format and the second and third input include data in the BF16 number format.
Page size control · CPC title
Details relating to cache mapping · CPC title
Prefetching based on hints or prefetch instructions · CPC title
Prefetching based on access pattern detection, e.g. stride based prefetch · CPC title
Reconfiguration of cache memory · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.