Apparatus and method for ray tracing instruction processing and execution

US12236519B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12236519-B2
Application numberUS-202218090810-A
CountryUS
Kind codeB2
Filing dateDec 29, 2022
Priority dateDec 28, 2018
Publication dateFeb 25, 2025
Grant dateFeb 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method to execute ray tracing instructions. For example, one embodiment of an apparatus comprises execution circuitry to execute a dequantize instruction to convert a plurality of quantized data values to a plurality of dequantized data values, the dequantize instruction including a first source operand to identify a plurality of packed quantized data values in a source register and a destination operand to identify a destination register in which to store a plurality of packed dequantized data values, wherein the execution circuitry is to convert each packed quantized data value in the source register to a floating point value, to multiply the floating point value by a first value to generate a first product and to add the first product to a second value to generate a dequantized data value, and to store the dequantized data value in a packed data element location in the destination register.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a Peripheral Component Interconnect Express interface; a set of memory controllers; a plurality of multi-core groups coupled to the Peripheral Component Interconnect Express interface and the set of memory controllers, wherein a multi-core group within the plurality of multi-core groups comprises: a plurality of graphics cores to process one or more shader programs; a plurality of tensor cores, apart from the plurality of graphics cores, to perform matrix operations; a ray tracing core, apart from the plurality of graphics cores and the plurality of tensor cores, to perform bounding volume hierarchy (BVH) operations and triangle intersection operations; a first cache shared among the plurality of graphics cores, the plurality of tensor cores, and the ray tracing core; and a set of register files to store operand values; wherein execution circuitry of at least one of the plurality of graphics cores, the plurality of tensor cores, and the ray tracing core is to execute a first instruction to select a minimum value from a plurality of threads, the first instruction including a first operand identifying values within the plurality of threads, and wherein the execution is to perform: returning a minimum value from values within a set of threads, the set of threads selected from the plurality of threads based on a mask; and a second cache shared by the plurality of multi-core groups. 2. The system of claim 1 , wherein the system further comprises: a set of processor cores coupled to the plurality of multi-core groups. 3. The system of claim 1 , wherein the system further comprises: an interface to couple to an external memory device. 4. The system of claim 1 , wherein the mask comprises one bit associated with each thread, wherein a first bit value indicates that a corresponding thread is included in the set of threads, and a second bit value indicates that the corresponding thread is not included in the set of threads. 5. The system of claim 1 , wherein each of the values associated with the plurality of threads comprises an integer. 6. The system of claim 1 , wherein the set of threads are synchronized. 7. The system of claim 1 , wherein the execution circuitry of at least one of the plurality of graphics cores, the plurality of tensor cores, and the ray tracing core to execute a second instruction including a second operand specifying the values associated with the plurality of threads to perform: returning a maximum value from the values associated with the set of threads, the set of threads selected from the plurality of threads based on the mask. 8. The system of claim 1 , wherein performing the BVH operations and triangle intersection operations comprises: generating rays for traversal through a graphics scene; constructing a hierarchical acceleration data structure comprising a plurality of hierarchically arranged nodes; and traversing one or more of the rays through the hierarchical acceleration data structure and intersecting the one or more rays with primitives contained within the plurality of hierarchically arranged nodes. 9. A system comprising: a Peripheral Component Interconnect Express interface; a set of memory controllers; a plurality of multi-core groups coupled to the Peripheral Component Interconnect Express interface and the set of memory controllers, wherein a multi-core group within the plurality of multi-core groups comprises: a plurality of graphics cores to process one or more shader programs; a plurality of tensor cores, apart from the plurality of graphics cores, to perform matrix operations; a ray tracing core, apart from the plurality of graphics cores and the plurality of tensor cores, to perform bounding volume hierarchy (BVH) operations and triangle intersection operations; a first cache shared among the plurality of graphics cores, the plurality of tensor cores, and the ray tracing core; and a set of register files to store operand values; wherein execution circuitry of at least one of the plurality of graphics cores, the plurality of tensor cores, and the ray tracing core is to execute a first instruction to select a maximum value from a plurality of threads, the first instruction including a first operand identifying values within the plurality of threads, and wherein the execution is to perform: returning a maximum value from values within a set of threads, the set of threads selected from the plurality of threads based on a mask; and a second cache shared by the plurality of multi-core groups. 10. The system of claim 9 , wherein the system further comprises: a set of processor cores coupled to the plurality of multi-core groups. 11. The system of claim 9 , wherein the system further comprises: an interface to couple to an external memory device. 12. The system of claim 9 , wherein the mask comprises one bit associated with each thread, wherein a first bit value indicates that a corresponding thread is included in the set of threads, and a second bit value indicates that the corresponding thread is not included in the set of threads. 13. The system of claim 9 , wherein each of the values associated with the plurality of threads comprises an integer. 14. The system of claim 9 , wherein the set of threads are synchronized. 15. The system of claim 9 , wherein the execution circuitry of at least one of the plurality of graphics cores, the plurality of tensor cores, and the ray tracing core to execute a second instruction including a second operand specifying the values associated with the plurality of threads to perform: returning a minimum value from the values associated with the set of threads, the set of threads selected from the plurality of threads based on the mask. 16. The system of claim 9 , wherein performing the BVH operations and triangle intersection operations comprises: generating rays for traversal through a graphics scene; constructing a hierarchical acceleration data structure comprising a plurality of hierarchically arranged nodes; and traversing one or more of the rays through the hierarchical acceleration data structure and intersecting the one or more rays with primitives contained within the plurality of hierarchically arranged nodes. 17. A graphics processing unit, comprising: a plurality of multi-core groups, wherein a multi-core group within the plurality of multi-core groups comprises: a plurality of graphics cores to process one or more shader programs; a plurality of tensor cores, apart from the plurality of graphics cores, to perform matrix operations; a ray tracing core, apart from the plurality of graphics cores and the plurality of tensor cores, to perform bounding volume hierarchy (BVH) operations and triangle intersection operations; a first cache shared among the plurality of graphics cores, the plurality of tensor cores, and the ray tracing core; and a set of register files to store operand values; wherein execution circuitry of at least one of the plurality of graphics cores, the plurality of tensor cores, and the ray tracing core is to execute a first instruction to select a minimum value from a plurality of threads, the first instruction including a first operand identifying values within the plurality of threads, and wherein the execution is to perform: returning a minimum value from values within a set of threads, the set of threads selected from the plurality of threads based on a mask. 18. The graphics processing unit of claim 17 , wherein the graphics processing unit furt

Assignees

Inventors

Classifications

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • using a mask · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Denoising; Smoothing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12236519B2 cover?
An apparatus and method to execute ray tracing instructions. For example, one embodiment of an apparatus comprises execution circuitry to execute a dequantize instruction to convert a plurality of quantized data values to a plurality of dequantized data values, the dequantize instruction including a first source operand to identify a plurality of packed quantized data values in a source registe…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T15/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).