Variable width bounding volume hierarchy nodes
US-2023206542-A1 · Jun 29, 2023 · US
US12045928B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12045928-B2 |
| Application number | US-202217665341-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 4, 2022 |
| Priority date | Feb 4, 2022 |
| Publication date | Jul 23, 2024 |
| Grant date | Jul 23, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and techniques are provided for enhancing operations of a ray tracing processor. For instance, a process can include obtaining one or more nodes of an acceleration data structure. Each node of the one or more nodes includes the same number of bytes. The node(s) can be stored in a cache associated with a ray tracing processor. Each of the stored node(s) are cache line-aligned with the cache associated with the ray tracing processor. A first stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a first clock cycle of the ray tracing processor. A second stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a second clock cycle of the ray tracing processor.
Opening claim text (preview).
What is claimed is: 1. A method of ray tracing, the method comprising: obtaining one or more nodes of an acceleration data structure, wherein each node of the one or more nodes is a constant-sized node including a constant number of bytes; storing the one or more nodes in a cache associated with a ray tracing processor, wherein each of the one or more stored nodes are cache line-aligned with the cache associated with the ray tracing processor; providing a first stored node of the one or more stored nodes for processing by a ray-node intersection logic unit of the ray tracing processor, wherein the ray-node intersection logic unit includes a shared floating point arithmetic logic unit (ALU), and wherein the ray-node intersection logic unit uses a first configuration of the shared floating point ALU to determine two or more ray-triangle intersections corresponding to the first stored node within a first clock cycle of the ray tracing processor; and providing a second stored node of the one or more stored nodes for processing by the ray-node intersection logic unit of the ray tracing processor, wherein the ray-node intersection logic unit uses a second configuration of the shared floating point ALU to determine four or more ray-bounding volume intersections corresponding to the second stored node within a second clock cycle of the ray tracing processor, wherein the first clock cycle and the second clock cycle are consecutive clock cycles. 2. The method of claim 1 , wherein: the ray-node intersection logic unit of the ray tracing processor is configured to determine the two or more ray-triangle intersections based on two or more triangles included in the first stored node; and the ray-node intersection logic unit of the ray tracing processor is configured to determine the four or more ray-bounding volume intersections based on four or more bounding volumes included in the second stored node. 3. The method of claim 2 , wherein: the two or more ray-triangle intersections based on the first stored node are determined during the first clock cycle of the ray tracing processor; and the four or more ray-bounding volume intersections based on the second stored node are determined during the second clock cycle of the ray tracing processor. 4. The method of claim 1 , wherein: the first stored node comprises a leaf node of the acceleration data structure and includes a first quantity of geometric primitives of the acceleration data structure; the second stored node comprises an internal node of the acceleration data structure and includes a second quantity of bounding volumes of the acceleration data structure; and the first stored node is cache line-aligned with a first cache line of the cache associated with the ray tracing processor and the second stored node is cache line-aligned with a second cache line of the cache associated with the ray tracing processor. 5. The method of claim 4 , wherein the second quantity is twice as large as the first quantity. 6. The method of claim 1 , wherein the ray-node intersection logic unit of the ray tracing processor further includes two or more ray-triangle logic units, and wherein the ray-node intersection logic unit uses the two or more ray-triangle logic units and the first configuration of the shared floating point ALU to determine the two or more ray-triangle intersections. 7. The method of claim 1 , wherein the ray-node intersection logic unit of the ray tracing processor further includes four or more ray-bounding volume logic units, and wherein the ray-node intersection logic unit uses the four or more ray-bounding volume logic units and the second configuration of the shared floating point ALU to determine the four or more ray-bounding volume intersections. 8. The method of claim 1 , wherein: the first stored node is a bounding volume hierarchy (BVH) node associated with two or more triangles corresponding to the two or more ray-triangle intersections; and the two or more triangles are stored in the BVH node. 9. The method of claim 8 , wherein the BVH node stores the two or more triangles as respective sets of coordinates associated with vertices of the two or more triangles. 10. The method of claim 1 , wherein the cache associated with the ray tracing processor is a graphics processing unit (GPU) cache. 11. The method of claim 1 , wherein the cache associated with the ray tracing processor is a level 0 (L0) cache of the ray tracing processor. 12. The method of claim 1 , wherein: a number of bytes included in each node of the one or more nodes is equal to the constant number of bytes; and each node of the one or more nodes is cache line-aligned with the cache associated with ray tracing processor based on the constant number of bytes being equal to a number of bytes included in a cache line of the cache associated with the ray tracing processor. 13. The method of claim 12 , wherein each node of the one or more nodes is 64 bytes. 14. The method of claim 1 , wherein the ray tracing processor is a ray tracing unit (RTU). 15. An apparatus for ray tracing, comprising: a memory; and one or more processors coupled to the memory, the one or more processors configured to: obtain one or more nodes of an acceleration data structure wherein each node of the one or more nodes is a constant-sized node including a constant number of bytes; store the one or more nodes in a cache associated with a ray tracing processor, wherein each of the one or more stored nodes are cache line-aligned with the cache associated with the ray tracing processor; provide a first stored node of the one or more stored nodes for processing by a ray-node intersection logic unit of the ray tracing processor, wherein the ray-node intersection logic unit includes a shared floating point arithmetic logic unit (ALU), and wherein the ray-node intersection logic unit uses a first configuration of the shared floating point ALU to determine two or more ray-triangle intersections corresponding to the first stored node within a first clock cycle of the ray tracing processor; and provide a second stored node of the one or more stored nodes for processing by the ray-node intersection logic unit of the ray tracing processor, wherein the ray-node intersection logic unit uses a second configuration of the shared floating point ALU to determine four or more ray-bounding volume intersections corresponding to the second stored node within a second clock cycle of the ray tracing processor, wherein the first clock cycle and the second clock cycle are consecutive clock cycles. 16. The apparatus of claim 15 , wherein the one or more processors are configured to: use the ray-node intersection logic unit to determine the two or more ray-triangle intersections based on two or more triangles included in the first stored node; and use the ray-node intersection logic unit to determine the four or more ray-bounding volume intersections based on four or more bounding volumes included in the second stored node. 17. The apparatus of claim 16 , wherein: the two or more ray-triangle intersections based on the first stored node are determined during the first clock cycle of the ray tracing processor; and the four or more ray-bounding volume intersections based on the second stored node are determined during the second clock cycle of the ray tracing processor. 18. The apparatus of claim 15 , wherein: the first stored node comprises a leaf node of the acceleration data structure and includes a first quantity of geometric primitives of the acceleration data structure;
Processor architectures; Processor configuration, e.g. pipelining · CPC title
General purpose rendering architectures · CPC title
Ray-tracing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.