Completion Signaling Techniques in Distributed Processor
US-2021279832-A1 · Sep 9, 2021 · US
US12541908B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12541908-B2 |
| Application number | US-202418587761-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 26, 2024 |
| Priority date | Mar 16, 2020 |
| Publication date | Feb 3, 2026 |
| Grant date | Feb 3, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatus and method for stack throttling. For example, one embodiment of an apparatus comprises: execution circuitry comprising a plurality of functional units to execute a plurality of ray shaders and generate a plurality of primary rays and a corresponding plurality of ray messages; a first in first out (FIFO) buffer to queue the ray messages generated by the EUs; a cache to store one or more of the plurality of primary rays; a memory-backed stack to store a first subset of the plurality of ray messages in a corresponding plurality of entries; memory-backed stack management circuitry to either store a second subset of the plurality of ray messages to the memory-backed stack, or to temporarily store the one or more the second subset of the plurality of ray messages to a memory subsystem based, at least in part, on a number of entries currently occupied by ray messages in the memory-backed stack; and ray traversal circuitry to read a next ray message from the memory-backed stack, retrieve a next primary ray identified by the ray message from the cache or a memory subsystem, and perform traversal operations on the next primary ray.
Opening claim text (preview).
What is claimed is: 1 . An apparatus comprising: a plurality of functional units to generate a plurality of rays and a plurality of ray messages corresponding to the plurality of rays, wherein a ray message corresponding to a ray within the plurality of rays includes a pointer to the ray and data which ray traversal circuitry uses to read and process the ray; a buffer to queue the plurality of ray messages generated by the plurality of functional units; a cache to store one or more of the plurality of rays; a storage to store one or more ray messages of the plurality of ray messages in a corresponding plurality of entries, wherein a determination of storing a ray message to the storage is based on a number of entries currently occupied by ray messages in the storage; and the ray traversal circuitry to read a next ray message from the storage, retrieve a next ray identified by the next ray message and perform traversal operations on the next ray; wherein shader dispatch to the plurality of functional units is based on ray message occupancy in the buffer or the storage. 2 . The apparatus of claim 1 , wherein the ray traversal circuitry is to perform the traversal operations on rays that are obtained in alternative ray banks to be alternated based on a clock cycle. 3 . The apparatus of claim 2 , wherein the ray traversal circuitry is to track untraversed rays in each of the alternative ray banks. 4 . The apparatus of claim 1 , further comprising: a ray compactor coupled to the plurality of functional units and to pack the plurality of ray messages within message slots to be sent to the buffer. 5 . The apparatus of claim 1 , wherein older ray messages in the storage are to be stored off to a memory subsystem based on occupancy of the storage. 6 . The apparatus of claim 1 , wherein the shader dispatch is throttled based on the ray message occupancy in the storage or the buffer being over one or more threshold values. 7 . The apparatus of claim 6 , wherein the one or more threshold values comprise a first maximum storage value indicating a maximum number of entries to be accessible to the plurality of functional units to execute a plurality of shaders. 8 . The apparatus of claim 6 , wherein the one or more threshold values further comprise a first minimum storage value indicating a minimum number of entries to be accessible to the plurality of functional units to execute a plurality of shaders. 9 . A method comprising: generating, by a plurality of functional units, a plurality of rays and a plurality of ray messages corresponding to the plurality of rays, wherein a ray message corresponding to a ray within the plurality of rays includes a pointer to the ray and data which ray traversal circuitry uses to read and process the ray; queuing to a buffer the plurality of ray messages generated by the plurality of functional units; storing to a cache one or more of the plurality of rays; storing to a storage one or more ray messages of the plurality of ray messages in a corresponding plurality of entries, wherein a determination of storing a ray message to the storage is based on a number of entries currently occupied by ray messages in the storage; reading a next ray message from the storage; retrieving a next ray identified by the next ray message; and performing, by the ray traversal circuitry, traversal operations on the next ray, wherein shader dispatch to the plurality of functional units is based on ray message occupancy in one or more of the buffer and the storage. 10 . The method of claim 9 , wherein the traversal operations are performed on rays that are obtained in alternative ray banks to be alternated based on a clock cycle. 11 . The method of claim 9 , wherein the plurality of ray messages are packed within message slots to be sent to the buffer. 12 . The method of claim 9 , wherein older ray messages in the storage are to be stored off to a memory subsystem based on occupancy of the storage. 13 . The method of claim 9 , wherein the shader dispatch is throttled based on the ray message occupancy in the storage or the buffer being over one or more threshold values. 14 . A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform: generating, by a plurality of functional units, a plurality of rays and a plurality of ray messages corresponding to the plurality of rays, wherein a ray message corresponding to a ray within the plurality of rays includes a pointer to the ray and data which ray traversal circuitry uses to read and process the ray; queuing to a buffer the plurality of ray messages generated by the plurality of functional units; storing to a cache one or more of the plurality of rays; storing to a storage one or more ray messages of the plurality of ray messages in a corresponding plurality of entries, wherein a determination of storing a ray message to the storage is based on a number of entries currently occupied by ray messages in the storage; reading a next ray message from the storage; retrieving a next ray identified by the next ray message; and performing, by the ray traversal circuitry, traversal operations on the next ray, wherein shader dispatch to the plurality of functional units is based on ray message occupancy in one or more of the buffer and the storage. 15 . The non-transitory machine-readable medium of claim 14 , wherein the traversal operations are performed by the ray traversal circuitry on rays that are obtained in alternative ray banks to be alternated based on a clock cycle. 16 . The non-transitory machine-readable medium of claim 15 , wherein the ray traversal circuitry is to track untraversed rays in each of the alternative ray banks. 17 . The non-transitory machine-readable medium of claim 14 , wherein the plurality of ray messages are packed within message slots to be sent to the buffer. 18 . The non-transitory machine-readable medium of claim 14 , wherein older ray messages in the storage are to be stored off to a memory subsystem based on occupancy of the storage. 19 . The non-transitory machine-readable medium of claim 14 , wherein the shader dispatch is throttled based on the ray message occupancy in the storage or the buffer being over one or more threshold values. 20 . The non-transitory machine-readable medium of claim 19 , wherein the one or more threshold values comprise a first maximum storage value indicating a maximum number of entries to be accessible to the plurality of functional units to execute a plurality of shaders.
Ray-tracing · CPC title
General purpose rendering architectures · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Memory management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.