Fine-grained parallel traversal for ray tracing

US9305392B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9305392-B2
Application numberUS-201213714284-A
CountryUS
Kind codeB2
Filing dateDec 13, 2012
Priority dateDec 13, 2012
Publication dateApr 5, 2016
Grant dateApr 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for tracing a ray within a parallel processing unit. A first thread receives a ray or a ray segment for tracing and identifies a first node within an acceleration structure associated with the ray, where the first node is associated with a volume of space traversed by the ray. The thread identifies the child nodes of the first node, where each child node is associated with a different sub-volume of space, and each sub-volume is associated with a corresponding ray segment. The thread determines that two or more nodes are associated with sub-volumes of space that intersect the ray segment. The thread selects one of these nodes for processing by the first thread and another for processing by a second thread. One advantage of the disclosed technique is that the threads in a thread group perform ray tracing more efficiently in that idle time is reduced.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for tracing a ray within a parallel processor, the method comprising: receiving, at a first thread of the parallel processor, at least a portion of a first ray for tracing; identifying a first node within an acceleration structure associated with the at least a portion of the ray, wherein the first node is associated with a volume of space traversed by the ray; identifying a plurality of nodes that comprise child nodes of the first node; selecting a second node of the plurality of nodes for processing by the first thread; selecting a third node of the plurality of nodes for processing by a second thread of the parallel processor; determining that the second thread is currently processing at least a portion of a second ray; in response to the determining that the second thread is currently processing the at least the portion of the second ray, placing an entry associated with the third node into a first data structure; and causing the second thread to retrieve the entry associated with the third node from the first data structure when the second thread has completed processing the at least the portion of the second ray. 2. The method of claim 1 , wherein the first data structure comprises a stack that is local to the first thread. 3. The method of claim 2 , further comprising: selecting a fourth node of the plurality of nodes for processing; determining that a number of threads are currently available to process the fourth node; and placing an entry associated with the fourth node into a second data structure; wherein the second data structure comprises a pooled work buffer that is shared among a plurality of threads comprising the first thread and the second thread. 4. The method of claim 1 , wherein the first data structure comprises a pooled work buffer that is shared among a plurality of threads comprising the first thread and the second thread. 5. The method of claim 4 , further comprising: selecting a fourth node of the plurality of nodes for processing; determining that the first data structure lacks sufficient space to store an entry associated with the fourth node; and placing an entry associated with the fourth node into a second data structure; wherein the second data structure is a stack that is local to the first thread. 6. The method of claim 1 , further comprising: determining that at least one thread in a plurality of threads comprising the first thread and the second thread has discovered a ray segment that intersects a graphics object between a first point and a second point; and causing each thread in the plurality of threads to terminate execution. 7. The method of claim 1 , further comprising: retrieving a first value from a storage location that represents a first intersection point at which the at least a portion of the ray intersects a first graphics object; determining that a ray segment intersects a second graphics object at a second intersection point; determining that the second intersection point is closer to a surface plane than the first intersection point; and causing a second value that represents the second intersection point to be stored in the storage location. 8. The method of claim 1 , further comprising: determining that a first ray segment intersects a graphics object at an intersection point; determining that a third thread of the parallel processor is processing a fourth node that corresponds to a second ray segment, wherein the second ray segment is more distant from a surface plane than the intersection point; and causing the third thread to terminate execution. 9. The method of claim 8 , further comprising causing the third thread to receive a fifth node that corresponds to a third ray segment for processing, wherein the third ray segment is closer to the surface plane than the intersection point. 10. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to trace a ray within a parallel processor, by performing the steps of: receiving, at a first thread of the parallel processor, at least a portion of a first ray for tracing; identifying a first node within an acceleration structure associated with the at least a portion of the ray, wherein the first node is associated with a volume of space traversed by the ray; identifying a plurality of nodes that comprise child nodes of the first node, wherein each node within the plurality of nodes is associated with a different sub-volume of space within the volume of space, and wherein each sub-volume of space is associated with a corresponding ray segment within the at least a portion of the ray; determining that two or more nodes within the plurality of nodes are associated with sub-volumes of space that intersect the at least a portion of the ray; selecting a second node that comprises one node of the two or more nodes for processing by the first thread; selecting a third node that comprises another node of the two or more nodes for processing by a second thread of the parallel processor; determining that the second thread is currently processing at least a portion of a second ray; in response to the determining that the second thread is currently processing the at least the portion of the second ray, placing an entry associated with the third node into a first data structure; and causing the second thread to retrieve the entry associated with the third node from the first data structure when the second thread has completed processing the at least the portion of the second ray. 11. The non-transitory computer-readable storage medium of claim 10 , wherein the first data structure comprises a stack that is local to the first thread. 12. The non-transitory computer-readable storage medium of claim 11 , wherein the processor is further configured to perform the steps of: selecting a fourth node that comprises yet another node of the two or more nodes for processing; determining that a number of threads are currently available to process the fourth node; and placing an entry associated with the fourth node into a second data structure; wherein the second data structure comprises a pooled work buffer that is shared among a plurality of threads comprising the first thread and the second thread. 13. The non-transitory computer-readable storage medium of claim 10 , wherein the first data structure comprises a pooled work buffer that is shared among a plurality of threads comprising the first thread and the second thread. 14. The non-transitory computer-readable storage medium of claim 13 , wherein the processor is further configured to perform the steps of: selecting a fourth node that comprises yet another node of the two or more nodes for processing; determining that the first data structure lacks sufficient space to store an entry associated with the fourth node; and placing an entry associated with the fourth node into a second data structure; wherein the second data structure is a stack that is local to the first thread. 15. The non-transitory computer-readable storage medium of claim 10 , wherein the processor is further configured to perform the steps of: determining that at least one thread in a plurality of threads comprising the first thread and the second thread has discovered a ray segment that intersects a graphics object between a first point and a second point; and causing each thread in the plurality of threads to terminate execution. 16. The non-transitory computer-readable storage medium of claim 10 , wherein the proce

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9305392B2 cover?
Techniques are disclosed for tracing a ray within a parallel processing unit. A first thread receives a ray or a ray segment for tracing and identifies a first node within an acceleration structure associated with the ray, where the first node is associated with a volume of space traversed by the ray. The thread identifies the child nodes of the first node, where each child node is associated w…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T15/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).