Apparatus and method for throttling a ray tracing pipeline

US12541908B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12541908-B2
Application numberUS-202418587761-A
CountryUS
Kind codeB2
Filing dateFeb 26, 2024
Priority dateMar 16, 2020
Publication dateFeb 3, 2026
Grant dateFeb 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and method for stack throttling. For example, one embodiment of an apparatus comprises: execution circuitry comprising a plurality of functional units to execute a plurality of ray shaders and generate a plurality of primary rays and a corresponding plurality of ray messages; a first in first out (FIFO) buffer to queue the ray messages generated by the EUs; a cache to store one or more of the plurality of primary rays; a memory-backed stack to store a first subset of the plurality of ray messages in a corresponding plurality of entries; memory-backed stack management circuitry to either store a second subset of the plurality of ray messages to the memory-backed stack, or to temporarily store the one or more the second subset of the plurality of ray messages to a memory subsystem based, at least in part, on a number of entries currently occupied by ray messages in the memory-backed stack; and ray traversal circuitry to read a next ray message from the memory-backed stack, retrieve a next primary ray identified by the ray message from the cache or a memory subsystem, and perform traversal operations on the next primary ray.

First claim

Opening claim text (preview).

What is claimed is: 1 . An apparatus comprising: a plurality of functional units to generate a plurality of rays and a plurality of ray messages corresponding to the plurality of rays, wherein a ray message corresponding to a ray within the plurality of rays includes a pointer to the ray and data which ray traversal circuitry uses to read and process the ray; a buffer to queue the plurality of ray messages generated by the plurality of functional units; a cache to store one or more of the plurality of rays; a storage to store one or more ray messages of the plurality of ray messages in a corresponding plurality of entries, wherein a determination of storing a ray message to the storage is based on a number of entries currently occupied by ray messages in the storage; and the ray traversal circuitry to read a next ray message from the storage, retrieve a next ray identified by the next ray message and perform traversal operations on the next ray; wherein shader dispatch to the plurality of functional units is based on ray message occupancy in the buffer or the storage. 2 . The apparatus of claim 1 , wherein the ray traversal circuitry is to perform the traversal operations on rays that are obtained in alternative ray banks to be alternated based on a clock cycle. 3 . The apparatus of claim 2 , wherein the ray traversal circuitry is to track untraversed rays in each of the alternative ray banks. 4 . The apparatus of claim 1 , further comprising: a ray compactor coupled to the plurality of functional units and to pack the plurality of ray messages within message slots to be sent to the buffer. 5 . The apparatus of claim 1 , wherein older ray messages in the storage are to be stored off to a memory subsystem based on occupancy of the storage. 6 . The apparatus of claim 1 , wherein the shader dispatch is throttled based on the ray message occupancy in the storage or the buffer being over one or more threshold values. 7 . The apparatus of claim 6 , wherein the one or more threshold values comprise a first maximum storage value indicating a maximum number of entries to be accessible to the plurality of functional units to execute a plurality of shaders. 8 . The apparatus of claim 6 , wherein the one or more threshold values further comprise a first minimum storage value indicating a minimum number of entries to be accessible to the plurality of functional units to execute a plurality of shaders. 9 . A method comprising: generating, by a plurality of functional units, a plurality of rays and a plurality of ray messages corresponding to the plurality of rays, wherein a ray message corresponding to a ray within the plurality of rays includes a pointer to the ray and data which ray traversal circuitry uses to read and process the ray; queuing to a buffer the plurality of ray messages generated by the plurality of functional units; storing to a cache one or more of the plurality of rays; storing to a storage one or more ray messages of the plurality of ray messages in a corresponding plurality of entries, wherein a determination of storing a ray message to the storage is based on a number of entries currently occupied by ray messages in the storage; reading a next ray message from the storage; retrieving a next ray identified by the next ray message; and performing, by the ray traversal circuitry, traversal operations on the next ray, wherein shader dispatch to the plurality of functional units is based on ray message occupancy in one or more of the buffer and the storage. 10 . The method of claim 9 , wherein the traversal operations are performed on rays that are obtained in alternative ray banks to be alternated based on a clock cycle. 11 . The method of claim 9 , wherein the plurality of ray messages are packed within message slots to be sent to the buffer. 12 . The method of claim 9 , wherein older ray messages in the storage are to be stored off to a memory subsystem based on occupancy of the storage. 13 . The method of claim 9 , wherein the shader dispatch is throttled based on the ray message occupancy in the storage or the buffer being over one or more threshold values. 14 . A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform: generating, by a plurality of functional units, a plurality of rays and a plurality of ray messages corresponding to the plurality of rays, wherein a ray message corresponding to a ray within the plurality of rays includes a pointer to the ray and data which ray traversal circuitry uses to read and process the ray; queuing to a buffer the plurality of ray messages generated by the plurality of functional units; storing to a cache one or more of the plurality of rays; storing to a storage one or more ray messages of the plurality of ray messages in a corresponding plurality of entries, wherein a determination of storing a ray message to the storage is based on a number of entries currently occupied by ray messages in the storage; reading a next ray message from the storage; retrieving a next ray identified by the next ray message; and performing, by the ray traversal circuitry, traversal operations on the next ray, wherein shader dispatch to the plurality of functional units is based on ray message occupancy in one or more of the buffer and the storage. 15 . The non-transitory machine-readable medium of claim 14 , wherein the traversal operations are performed by the ray traversal circuitry on rays that are obtained in alternative ray banks to be alternated based on a clock cycle. 16 . The non-transitory machine-readable medium of claim 15 , wherein the ray traversal circuitry is to track untraversed rays in each of the alternative ray banks. 17 . The non-transitory machine-readable medium of claim 14 , wherein the plurality of ray messages are packed within message slots to be sent to the buffer. 18 . The non-transitory machine-readable medium of claim 14 , wherein older ray messages in the storage are to be stored off to a memory subsystem based on occupancy of the storage. 19 . The non-transitory machine-readable medium of claim 14 , wherein the shader dispatch is throttled based on the ray message occupancy in the storage or the buffer being over one or more threshold values. 20 . The non-transitory machine-readable medium of claim 19 , wherein the one or more threshold values comprise a first maximum storage value indicating a maximum number of entries to be accessible to the plurality of functional units to execute a plurality of shaders.

Assignees

Inventors

Classifications

  • G06T15/06Primary

    Ray-tracing · CPC title

  • G06T15/005Primary

    General purpose rendering architectures · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Memory management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12541908B2 cover?
Apparatus and method for stack throttling. For example, one embodiment of an apparatus comprises: execution circuitry comprising a plurality of functional units to execute a plurality of ray shaders and generate a plurality of primary rays and a corresponding plurality of ray messages; a first in first out (FIFO) buffer to queue the ray messages generated by the EUs; a cache to store one or mor…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T15/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).