Instruction execution in graphics processor shader programs
US-2020082491-A1 · Mar 12, 2020 · US
US11568591B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11568591-B2 |
| Application number | US-202016996208-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 18, 2020 |
| Priority date | Dec 28, 2018 |
| Publication date | Jan 31, 2023 |
| Grant date | Jan 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus and method to execute ray tracing instructions. For example, one embodiment of an apparatus comprises execution circuitry to execute a dequantize instruction to convert a plurality of quantized data values to a plurality of dequantized data values, the dequantize instruction including a first source operand to identify a plurality of packed quantized data values in a source register and a destination operand to identify a destination register in which to store a plurality of packed dequantized data values, wherein the execution circuitry is to convert each packed quantized data value in the source register to a floating point value, to multiply the floating point value by a first value to generate a first product and to add the first product to a second value to generate a dequantized data value, and to store the dequantized data value in a packed data element location in the destination register.
Opening claim text (preview).
What is claimed is: 1. A graphics processing unit, comprising: a plurality of multi-core groups, wherein a multi-core group comprises: a plurality of graphics cores to process one or more shader programs; a plurality of tensor cores, apart from the plurality of graphics cores, to perform matrix operations including matrix multiplication operations for neural network training and inferencing; one or more ray tracing cores, apart from the plurality of graphics cores and the plurality of tensor cores, to perform all ray tracing operations to save the plurality of graphics cores from overloading, wherein a ray tracing core includes a first set of specialized circuitry for performing bounding box tests and a second set of specialized circuitry for performing the ray-triangle intersection tests, and wherein the ray tracing core independently performs all calculations for bounding box tests and ray traversal and intersection of a ray; a cache shared among the plurality of graphics cores, the plurality of tensor cores, and the one or more ray tracing cores; and a set of register files to store operand values; and wherein execution circuitry of at least one of the graphics cores, tensor cores, and ray tracing cores is to execute a first instruction to select a minimum value from a plurality of threads including a first operand identifying values within the plurality of threads, the execution is to perform the operation of: returning a minimum value from values within a set of threads, the set of threads selected from the plurality of threads based on a mask. 2. The graphics processing unit of claim 1 , wherein the mask comprises one bit associated with each thread, wherein a first bit value indicates that a corresponding thread is included in the set of threads, and a second bit value indicates that the thread is not included in the set of threads. 3. The graphics processing unit of claim 1 , wherein each of the values associated with the plurality of threads comprises an integer. 4. The graphics processing unit of claim 1 , wherein the set of threads are synchronized. 5. The graphics processing unit of claim 1 , wherein the execution circuitry of at least one of the graphics cores, tensor cores, and ray tracing cores to execute a second instruction to select a maximum value from the plurality of threads, the second instruction including a second operand identifying values within the plurality of threads to perform the operation of: returning a maximum value from the values within the set of threads, the set of threads selected from the plurality of threads based on the mask. 6. The graphics processing unit of claim 1 , wherein performing the ray tracing operations comprises: generating rays for traversal through a graphics scene; constructing a hierarchical acceleration data structure comprising a plurality of hierarchically arranged nodes; and traversing one or more of the rays through the hierarchical acceleration data structure and intersecting the one or more rays with primitives contained within the hierarchically arranged nodes. 7. A graphics processing unit, comprising: a plurality of multi-core groups, wherein a multi-core group comprises: a plurality of graphics cores to process one or more shader programs; a plurality of tensor cores, apart from the plurality of graphics cores, to perform matrix operations including matrix multiplication operations for neural network training and inferencing; one or more ray tracing cores, apart from the plurality of graphics cores and the plurality of tensor cores, to perform all ray tracing operations to save the plurality of graphics cores from overloading, wherein a ray tracing core includes a first set of specialized circuitry for performing bounding box tests and a second set of specialized circuitry for performing the ray-triangle intersection tests, and wherein the ray tracing core independently performs all calculations for bounding box tests and ray traversal and intersection of a ray; a cache shared among the plurality of graphics cores, the plurality of tensor cores, and the one or more ray tracing cores; and a set of register files to store operand values; and wherein execution circuitry of at least one of the graphics cores, tensor cores, and ray tracing cores is to execute a first instruction to select a maximum value from the plurality of threads, the second instruction including a second operand identifying values within the plurality of threads to perform the operation of: returning a maximum value from values within the set of threads, the set of threads selected from the plurality of threads based on the mask. 8. The graphics processing unit of claim 7 , wherein the mask comprises one bit associated with each thread, wherein a first bit value indicates that a corresponding thread is included in the set of threads, and a second bit value indicates that the thread is not included in the set of threads. 9. The graphics processing unit of claim 7 , wherein each of the values associated with the plurality of threads comprises an integer. 10. The graphics processing unit of claim 7 , wherein the set of threads are synchronized. 11. The graphics processing unit of claim 7 , wherein the execution circuitry of at least one of the graphics cores, tensor cores, and ray tracing cores to execute a second instruction to select a minimum value from a plurality of threads, the first instruction including a first operand identifying values within the plurality of threads, the execution is to perform the operation of: returning a minimum value from the values within a set of threads, the set of threads selected from the plurality of threads based on a mask. 12. The graphics processing unit of claim 7 , wherein performing the ray tracing operations comprises: generating rays for traversal through a graphics scene; constructing a hierarchical acceleration data structure comprising a plurality of hierarchically arranged nodes; and traversing one or more of the rays through the hierarchical acceleration data structure and intersecting the one or more rays with primitives contained within the hierarchically arranged nodes.
Memory management · CPC title
Physics · mapped topic
Bandwidth reduction · CPC title
Image coding (bandwidth or redundancy reduction for static pictures H04N1/41; coding or decoding of static colour picture signals H04N1/64; methods or arrangements for coding, decoding, compressing or decompressing digital video signals H04N19/00) · CPC title
Ray-tracing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.