Apparatus and method for managing data bias in a graphics processing architecture
US-2018293690-A1 · Oct 11, 2018 · US
US10402937B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10402937-B2 |
| Application number | US-201715857330-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 28, 2017 |
| Priority date | Dec 28, 2017 |
| Publication date | Sep 3, 2019 |
| Grant date | Sep 3, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for rendering graphics frames allocates rendering work to multiple graphics processing units (GPUs) that are configured to allow access to pages of data stored in locally attached memory of a peer GPU. The method includes the steps of generating, by a first GPU coupled to a first memory circuit, one or more first memory access requests to render a first primitive for a first frame, where at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU. The first GPU requests the first page of data through a first data link coupling the first GPU to the second GPU and a register circuit within the first GPU accumulates an access request count for the first page of data. The first GPU notifies a driver that the access request count has reached a specified threshold.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: generating, by a first graphics processing unit (GPU) coupled to a first memory circuit, one or more first memory access requests in connection with rendering a first primitive for a first frame, wherein at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU; requesting, by the first GPU, the first page of data through a first data link coupling the first GPU to the second GPU; accumulating, by a register circuit within the first GPU, an access request count for the first page of data; notifying a driver, by the first GPU, that the access request count has reached a specified threshold; receiving, by the first GPU, a first copy command to copy the first page of data from the second memory circuit through the first data link to produce a copy of the first page of data within the first memory circuit before the first GPU accesses the first page of data in connection with rendering the first primitive for a second frame; executing, by the first GPU, the first copy command; and generating, by the first GPU, one or more second memory access requests in connection with rendering the first primitive for the second frame, wherein at least one of the second memory access requests targets the copy of the first page of data within the first memory circuit. 2. The method of claim 1 , wherein the first page of data is stored in a compressed format within the second memory circuit and the copy of the first page of data is stored in the compressed format within the first memory circuit. 3. The method of claim 2 , wherein the first page of data is copied through the first data link in the compressed format. 4. The method of claim 1 , wherein a first command stream specifies a first rendering pass for the first frame and a second command stream specifies a first rendering pass for a second frame, and the notifying occurs during the first rendering pass for the first frame. 5. The method of claim 1 , further comprising, prior to generating the one or more first memory access requests: receiving, by the first GPU, the first primitive; and determining, by a clipping circuit within the first GPU, that a location for the first primitive intersects a first region of the first frame that is assigned to the first GPU. 6. The method of claim 1 , wherein requesting the first page of data through the first data link comprises determining the first page of data resides within a first remote aperture mapped to the second GPU. 7. The method of claim 6 , wherein a memory management unit determines that the first page resides within the first remote aperture. 8. The method of claim 1 , wherein the one or more first memory access requests comprise an atomic access operation performed on data residing within the second memory circuit. 9. The method of claim 1 , further comprising, prior to generating the one or more first memory access requests: receiving, by the first GPU, the first primitive; and determining, by prepended shader instructions, that a first cooperative thread array comprising the first primitive will execute on the first GPU. 10. The method of claim 1 , wherein the one or more first memory access requests each include a memory address; and the register circuit is configured to increment the access request count when the memory address is within a programmable address range for the register circuit. 11. The method of claim 1 , wherein the first frame is divided into rectangular regions and adjacent rectangular regions sharing a common edge are assigned alternately to the first GPU and the second GPU. 12. The method of claim 11 , wherein the rectangular regions assigned to the first GPU form a checkerboard pattern. 13. A system, comprising: a first graphics processing unit (GPU) coupled to a first memory circuit configured to: generate one or more first memory access requests in connection with rendering a first primitive for a first frame, wherein at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU; request the first page of data through a first data link coupling the first GPU to the second GPU; accumulate, by a register circuit within the first GPU, an access request count for the first page of data; notify a driver that the access request count has reached a specified threshold; receive a first copy command to copy the first page of data from the second memory circuit through the first data link to produce a copy of the first page of data within the first memory circuit before the first GPU accesses the first page of data in connection with rendering the first primitive for a second frame; execute the first copy command; and generate one or more second memory access requests in connection with rendering the first primitive for the second frame, wherein at least one of the second memory access requests targets the copy of the first page of data within the first memory circuit. 14. The system of claim 13 , the first GPU further configured to: receive the first primitive; and determine, by a clipping circuit within the first GPU, that a screen-space location for the first primitive intersects a first region of the first frame that is assigned to the first GPU. 15. The system of claim 13 , further comprising a cache subsystem configured to coalesce two or more of the first memory access requests into one request. 16. A non-transitory, computer-readable storage medium storing instructions that, when executed by a first graphics processing unit (GPU) coupled to a first memory circuit, cause the first GPU to: generate one or more first memory access requests in connection with rendering a first primitive for a first frame, wherein at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU; request the first page of data through a first data link coupling the first GPU to the second GPU; accumulate, by a register circuit within the first GPU, an access request count for the first page of data; notify a driver by the first GPU that the access request count has reached a specified threshold; receive a first copy command to copy the first page of data from the second memory circuit through the first data link to produce a copy of the first page of data within the first memory circuit before the first GPU accesses the first page of data in connection with rendering the first primitive for a second frame; execute the first copy command; and generate one or more second memory access requests in connection with rendering the first primitive for the second frame, wherein at least one of the second memory access requests targets the copy of the first page of data within the first memory circuit.
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Image or video data · CPC title
for multiprocessing or multitasking · CPC title
using page tables, e.g. page table structures · CPC title
Memory management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.