Graphics processing units with power management and latency reduction
US-2022207813-A1 · Jun 30, 2022 · US
US12436705B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12436705-B2 |
| Application number | US-202117358914-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 25, 2021 |
| Priority date | Jun 25, 2021 |
| Publication date | Oct 7, 2025 |
| Grant date | Oct 7, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus to facilitate a dynamically scalable and partitioned copy engine is disclosed. The apparatus includes a processor comprising copy engine hardware circuitry to facilitate copying surface data in memory and comprising: a plurality of copy front-end hardware circuitry to generate a plurality of surface data sub-blocks, wherein a number of the plurality of copy front-end hardware circuitry corresponds to a number of partitions configured for the processor, with each partition associated with a single copy front-end hardware circuitry; a plurality of copy back-end hardware circuitry to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses, wherein subsets of the plurality of copy back-end hardware circuitry are each associated with the single copy front-end hardware circuitry associated with each partition; and a connectivity matrix hardware circuitry to communicably connect the plurality of copy front-end hardware circuitry to the plurality of copy back-end hardware circuitry.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, by a copy engine hardware circuitry of a graphics processor, configuration information for a graphics processor, the configuration information comprising a number of partitions of the processor and a size of each of the partitions; configuring a number of copy front-ends of the copy engine hardware circuitry based on the number of partitions, wherein each partition is assigned a copy front-end; configuring a number of copy back-ends of the copy engine hardware circuitry based on the size of each of the partitions, wherein each copy front-end is assigned one or more of the copy back-ends, and wherein each of the copy front-ends and the one or more of the copy back-ends hardware circuitry that is assigned to the single each of the copy front-ends is to form a copy engine building block of the copy engine hardware circuitry; configuring sub-networks of a connectivity matrix, wherein each set of copy front-end and corresponding one or more copy back-ends is communicably coupled via one of the sub-networks; and responsive to one of the partitions being reconfigured, resetting the copy engine building block corresponding to the one of the partitions being reconfigured to an unassigned state by resetting the copy front-end hardware circuitry associated with the one of the partitions and draining to an idle state the subset of copy back-end hardware circuitry corresponding to the copy front-end hardware circuitry being reset. 2. The method of claim 1 , wherein the configuration information is received using a register of the graphics processor assigned to the copy engine hardware circuitry. 3. The method of claim 1 , wherein the partitions comprises at least one of hard partitions of the graphics processor, virtual machines (VMs) hosted on the graphics processor, containers hosted on the graphics processor, or a logical resource partition of the processor. 4. The method of claim 1 , wherein the copy front-end comprises copy front-end hardware circuitry to divide surface data from a source location in memory to generate a plurality of surface data sub-blocks. 5. The method of claim 4 , wherein the one or more copy back-ends comprise copy back-end hardware circuitry to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses. 6. The method of claim 1 , wherein the one or more of the copy back-ends assigned to each copy front-end is determined based on a maximum copy bandwidth supported by the partition associated with the each copy front-end. 7. A processor comprising: copy engine hardware circuitry to facilitate copying surface data in memory and comprising: a plurality of copy front-end hardware circuitry to generate a plurality of surface data sub-blocks from the surface data, wherein a number of the plurality of copy front-end hardware circuitry corresponds to a number of partitions configured for the processor, with each partition associated with a single copy front-end hardware circuitry of the plurality of copy front-end hardware circuitry; a plurality of copy back-end hardware circuitry to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses, wherein subsets of the plurality of copy back-end hardware circuitry are each associated with the single copy front-end hardware circuitry associated with each partition, and wherein each set of the single copy front-end hardware circuitry and one of the subsets of the copy back-end hardware circuitry that is assigned to the single copy front-end hardware circuitry is to form a copy engine building block of the copy engine hardware circuitry; and a connectivity matrix hardware circuitry to communicably connect the plurality of copy front-end hardware circuitry to the plurality of copy back-end hardware circuitry; wherein responsive to one of the partitions being reconfigured, the copy engine building block corresponding to the one of the partitions being reconfigured is reset to an unassigned state by resetting the copy front-end hardware circuitry associated with the one of the partitions and draining to an idle state the subset of copy back-end hardware circuitry corresponding to the copy front-end hardware circuitry being reset. 8. The processor of claim 7 , wherein the partitions comprises at least one of hard partitions of the processor, virtual machines (VMs) hosted on the processor, containers hosted on the processor, or a logical resource partition of the processor. 9. The processor of claim 7 , wherein a number of the copy back-end hardware circuitry in each of the subsets is determined based on a maximum copy bandwidth supported by the partition corresponding to the single copy front-end hardware circuitry. 10. The processor of claim 7 , wherein the connectivity matrix hardware circuitry comprises controller circuitry and a plurality of crossbar circuitry. 11. The processor of claim 7 , wherein the connectivity matrix hardware circuitry comprises a plurality of subnetworks each used to connect each of the copy front-end hardware circuitry to the subset of the copy back-end hardware circuitry that is assigned to the single copy front-end hardware circuitry. 12. The processor of claim 7 , wherein the processor comprises a graphics processing unit (GPU). 13. The processor of claim 7 , wherein the processor is at least one of a single instruction multiple data (SIMD) machine or a single instruction multiple thread (SIMT) machine. 14. A system comprising: a memory to store surface data in a source location; and copy engine hardware circuitry to facilitate copying the surface data from the source location in the memory to a destination location in the memory and comprising: a plurality of copy front-end hardware circuitry to generate a plurality of surface data sub-blocks from the surface data, wherein a number of the plurality of copy front-end hardware circuitry corresponds to a number of partitions configured for a processor, with each partition associated with a single copy front-end hardware circuitry of the plurality of copy front-end hardware circuitry; a plurality of copy back-end hardware circuitry to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses, wherein subsets of the plurality of copy back-end hardware circuitry are each associated with the single copy front-end hardware circuitry associated with each partition, and wherein each set of the single copy front-end hardware circuitry and one of the subsets of the copy back-end hardware circuitry that is assigned to the single copy front-end hardware circuitry is to form a copy engine building block of the copy engine hardware circuitry; and a connectivity matrix hardware circuitry to communicably connect the plurality of copy front-end hardware circuitry to the plurality of copy back-end hardware circuitry; wherein responsive to one of the partitions being reconfigured, the copy engine building block corresponding to the one of the partitions being reconfigured is reset to an unassigned state by resetting the copy front-end hardware circuitry associated with the one of the partitions and draining to an idle state the subset of copy back-end hardware circuitry corresponding to the copy front-end hardware circuitry being reset. 15. The system of claim 14 , wherein the partitions comprises at least one of hard partitions of the processor, virtual machines (VMs) hosted on the processor, containers hosted on the processor, or a logical resource partition of the processor. 16. The system of claim 14 , where
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Divergence aspects · CPC title
using bus bridges (G06F13/4022 takes precedence) · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Single storage device · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.