Multi-GPU frame rendering

US10402937B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10402937-B2
Application numberUS-201715857330-A
CountryUS
Kind codeB2
Filing dateDec 28, 2017
Priority dateDec 28, 2017
Publication dateSep 3, 2019
Grant dateSep 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for rendering graphics frames allocates rendering work to multiple graphics processing units (GPUs) that are configured to allow access to pages of data stored in locally attached memory of a peer GPU. The method includes the steps of generating, by a first GPU coupled to a first memory circuit, one or more first memory access requests to render a first primitive for a first frame, where at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU. The first GPU requests the first page of data through a first data link coupling the first GPU to the second GPU and a register circuit within the first GPU accumulates an access request count for the first page of data. The first GPU notifies a driver that the access request count has reached a specified threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: generating, by a first graphics processing unit (GPU) coupled to a first memory circuit, one or more first memory access requests in connection with rendering a first primitive for a first frame, wherein at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU; requesting, by the first GPU, the first page of data through a first data link coupling the first GPU to the second GPU; accumulating, by a register circuit within the first GPU, an access request count for the first page of data; notifying a driver, by the first GPU, that the access request count has reached a specified threshold; receiving, by the first GPU, a first copy command to copy the first page of data from the second memory circuit through the first data link to produce a copy of the first page of data within the first memory circuit before the first GPU accesses the first page of data in connection with rendering the first primitive for a second frame; executing, by the first GPU, the first copy command; and generating, by the first GPU, one or more second memory access requests in connection with rendering the first primitive for the second frame, wherein at least one of the second memory access requests targets the copy of the first page of data within the first memory circuit. 2. The method of claim 1 , wherein the first page of data is stored in a compressed format within the second memory circuit and the copy of the first page of data is stored in the compressed format within the first memory circuit. 3. The method of claim 2 , wherein the first page of data is copied through the first data link in the compressed format. 4. The method of claim 1 , wherein a first command stream specifies a first rendering pass for the first frame and a second command stream specifies a first rendering pass for a second frame, and the notifying occurs during the first rendering pass for the first frame. 5. The method of claim 1 , further comprising, prior to generating the one or more first memory access requests: receiving, by the first GPU, the first primitive; and determining, by a clipping circuit within the first GPU, that a location for the first primitive intersects a first region of the first frame that is assigned to the first GPU. 6. The method of claim 1 , wherein requesting the first page of data through the first data link comprises determining the first page of data resides within a first remote aperture mapped to the second GPU. 7. The method of claim 6 , wherein a memory management unit determines that the first page resides within the first remote aperture. 8. The method of claim 1 , wherein the one or more first memory access requests comprise an atomic access operation performed on data residing within the second memory circuit. 9. The method of claim 1 , further comprising, prior to generating the one or more first memory access requests: receiving, by the first GPU, the first primitive; and determining, by prepended shader instructions, that a first cooperative thread array comprising the first primitive will execute on the first GPU. 10. The method of claim 1 , wherein the one or more first memory access requests each include a memory address; and the register circuit is configured to increment the access request count when the memory address is within a programmable address range for the register circuit. 11. The method of claim 1 , wherein the first frame is divided into rectangular regions and adjacent rectangular regions sharing a common edge are assigned alternately to the first GPU and the second GPU. 12. The method of claim 11 , wherein the rectangular regions assigned to the first GPU form a checkerboard pattern. 13. A system, comprising: a first graphics processing unit (GPU) coupled to a first memory circuit configured to: generate one or more first memory access requests in connection with rendering a first primitive for a first frame, wherein at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU; request the first page of data through a first data link coupling the first GPU to the second GPU; accumulate, by a register circuit within the first GPU, an access request count for the first page of data; notify a driver that the access request count has reached a specified threshold; receive a first copy command to copy the first page of data from the second memory circuit through the first data link to produce a copy of the first page of data within the first memory circuit before the first GPU accesses the first page of data in connection with rendering the first primitive for a second frame; execute the first copy command; and generate one or more second memory access requests in connection with rendering the first primitive for the second frame, wherein at least one of the second memory access requests targets the copy of the first page of data within the first memory circuit. 14. The system of claim 13 , the first GPU further configured to: receive the first primitive; and determine, by a clipping circuit within the first GPU, that a screen-space location for the first primitive intersects a first region of the first frame that is assigned to the first GPU. 15. The system of claim 13 , further comprising a cache subsystem configured to coalesce two or more of the first memory access requests into one request. 16. A non-transitory, computer-readable storage medium storing instructions that, when executed by a first graphics processing unit (GPU) coupled to a first memory circuit, cause the first GPU to: generate one or more first memory access requests in connection with rendering a first primitive for a first frame, wherein at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU; request the first page of data through a first data link coupling the first GPU to the second GPU; accumulate, by a register circuit within the first GPU, an access request count for the first page of data; notify a driver by the first GPU that the access request count has reached a specified threshold; receive a first copy command to copy the first page of data from the second memory circuit through the first data link to produce a copy of the first page of data within the first memory circuit before the first GPU accesses the first page of data in connection with rendering the first primitive for a second frame; execute the first copy command; and generate one or more second memory access requests in connection with rendering the first primitive for the second frame, wherein at least one of the second memory access requests targets the copy of the first page of data within the first memory circuit.

Assignees

Inventors

Classifications

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Image or video data · CPC title

  • for multiprocessing or multitasking · CPC title

  • using page tables, e.g. page table structures · CPC title

  • G06T1/60Primary

    Memory management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10402937B2 cover?
A method for rendering graphics frames allocates rendering work to multiple graphics processing units (GPUs) that are configured to allow access to pages of data stored in locally attached memory of a peer GPU. The method includes the steps of generating, by a first GPU coupled to a first memory circuit, one or more first memory access requests to render a first primitive for a first frame, whe…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).