Apparatus and method for ray tracing instruction processing and execution

US11568591B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11568591-B2
Application numberUS-202016996208-A
CountryUS
Kind codeB2
Filing dateAug 18, 2020
Priority dateDec 28, 2018
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method to execute ray tracing instructions. For example, one embodiment of an apparatus comprises execution circuitry to execute a dequantize instruction to convert a plurality of quantized data values to a plurality of dequantized data values, the dequantize instruction including a first source operand to identify a plurality of packed quantized data values in a source register and a destination operand to identify a destination register in which to store a plurality of packed dequantized data values, wherein the execution circuitry is to convert each packed quantized data value in the source register to a floating point value, to multiply the floating point value by a first value to generate a first product and to add the first product to a second value to generate a dequantized data value, and to store the dequantized data value in a packed data element location in the destination register.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processing unit, comprising: a plurality of multi-core groups, wherein a multi-core group comprises: a plurality of graphics cores to process one or more shader programs; a plurality of tensor cores, apart from the plurality of graphics cores, to perform matrix operations including matrix multiplication operations for neural network training and inferencing; one or more ray tracing cores, apart from the plurality of graphics cores and the plurality of tensor cores, to perform all ray tracing operations to save the plurality of graphics cores from overloading, wherein a ray tracing core includes a first set of specialized circuitry for performing bounding box tests and a second set of specialized circuitry for performing the ray-triangle intersection tests, and wherein the ray tracing core independently performs all calculations for bounding box tests and ray traversal and intersection of a ray; a cache shared among the plurality of graphics cores, the plurality of tensor cores, and the one or more ray tracing cores; and a set of register files to store operand values; and wherein execution circuitry of at least one of the graphics cores, tensor cores, and ray tracing cores is to execute a first instruction to select a minimum value from a plurality of threads including a first operand identifying values within the plurality of threads, the execution is to perform the operation of: returning a minimum value from values within a set of threads, the set of threads selected from the plurality of threads based on a mask. 2. The graphics processing unit of claim 1 , wherein the mask comprises one bit associated with each thread, wherein a first bit value indicates that a corresponding thread is included in the set of threads, and a second bit value indicates that the thread is not included in the set of threads. 3. The graphics processing unit of claim 1 , wherein each of the values associated with the plurality of threads comprises an integer. 4. The graphics processing unit of claim 1 , wherein the set of threads are synchronized. 5. The graphics processing unit of claim 1 , wherein the execution circuitry of at least one of the graphics cores, tensor cores, and ray tracing cores to execute a second instruction to select a maximum value from the plurality of threads, the second instruction including a second operand identifying values within the plurality of threads to perform the operation of: returning a maximum value from the values within the set of threads, the set of threads selected from the plurality of threads based on the mask. 6. The graphics processing unit of claim 1 , wherein performing the ray tracing operations comprises: generating rays for traversal through a graphics scene; constructing a hierarchical acceleration data structure comprising a plurality of hierarchically arranged nodes; and traversing one or more of the rays through the hierarchical acceleration data structure and intersecting the one or more rays with primitives contained within the hierarchically arranged nodes. 7. A graphics processing unit, comprising: a plurality of multi-core groups, wherein a multi-core group comprises: a plurality of graphics cores to process one or more shader programs; a plurality of tensor cores, apart from the plurality of graphics cores, to perform matrix operations including matrix multiplication operations for neural network training and inferencing; one or more ray tracing cores, apart from the plurality of graphics cores and the plurality of tensor cores, to perform all ray tracing operations to save the plurality of graphics cores from overloading, wherein a ray tracing core includes a first set of specialized circuitry for performing bounding box tests and a second set of specialized circuitry for performing the ray-triangle intersection tests, and wherein the ray tracing core independently performs all calculations for bounding box tests and ray traversal and intersection of a ray; a cache shared among the plurality of graphics cores, the plurality of tensor cores, and the one or more ray tracing cores; and a set of register files to store operand values; and wherein execution circuitry of at least one of the graphics cores, tensor cores, and ray tracing cores is to execute a first instruction to select a maximum value from the plurality of threads, the second instruction including a second operand identifying values within the plurality of threads to perform the operation of: returning a maximum value from values within the set of threads, the set of threads selected from the plurality of threads based on the mask. 8. The graphics processing unit of claim 7 , wherein the mask comprises one bit associated with each thread, wherein a first bit value indicates that a corresponding thread is included in the set of threads, and a second bit value indicates that the thread is not included in the set of threads. 9. The graphics processing unit of claim 7 , wherein each of the values associated with the plurality of threads comprises an integer. 10. The graphics processing unit of claim 7 , wherein the set of threads are synchronized. 11. The graphics processing unit of claim 7 , wherein the execution circuitry of at least one of the graphics cores, tensor cores, and ray tracing cores to execute a second instruction to select a minimum value from a plurality of threads, the first instruction including a first operand identifying values within the plurality of threads, the execution is to perform the operation of: returning a minimum value from the values within a set of threads, the set of threads selected from the plurality of threads based on a mask. 12. The graphics processing unit of claim 7 , wherein performing the ray tracing operations comprises: generating rays for traversal through a graphics scene; constructing a hierarchical acceleration data structure comprising a plurality of hierarchically arranged nodes; and traversing one or more of the rays through the hierarchical acceleration data structure and intersecting the one or more rays with primitives contained within the hierarchically arranged nodes.

Assignees

Inventors

Classifications

  • Memory management · CPC title

  • Physics · mapped topic

  • Bandwidth reduction · CPC title

  • Image coding (bandwidth or redundancy reduction for static pictures H04N1/41; coding or decoding of static colour picture signals H04N1/64; methods or arrangements for coding, decoding, compressing or decompressing digital video signals H04N19/00) · CPC title

  • G06T15/06Primary

    Ray-tracing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568591B2 cover?
An apparatus and method to execute ray tracing instructions. For example, one embodiment of an apparatus comprises execution circuitry to execute a dequantize instruction to convert a plurality of quantized data values to a plurality of dequantized data values, the dequantize instruction including a first source operand to identify a plurality of packed quantized data values in a source registe…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T15/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).