Dynamically scalable and partitioned copy engine

US12436705B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12436705-B2
Application numberUS-202117358914-A
CountryUS
Kind codeB2
Filing dateJun 25, 2021
Priority dateJun 25, 2021
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus to facilitate a dynamically scalable and partitioned copy engine is disclosed. The apparatus includes a processor comprising copy engine hardware circuitry to facilitate copying surface data in memory and comprising: a plurality of copy front-end hardware circuitry to generate a plurality of surface data sub-blocks, wherein a number of the plurality of copy front-end hardware circuitry corresponds to a number of partitions configured for the processor, with each partition associated with a single copy front-end hardware circuitry; a plurality of copy back-end hardware circuitry to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses, wherein subsets of the plurality of copy back-end hardware circuitry are each associated with the single copy front-end hardware circuitry associated with each partition; and a connectivity matrix hardware circuitry to communicably connect the plurality of copy front-end hardware circuitry to the plurality of copy back-end hardware circuitry.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, by a copy engine hardware circuitry of a graphics processor, configuration information for a graphics processor, the configuration information comprising a number of partitions of the processor and a size of each of the partitions; configuring a number of copy front-ends of the copy engine hardware circuitry based on the number of partitions, wherein each partition is assigned a copy front-end; configuring a number of copy back-ends of the copy engine hardware circuitry based on the size of each of the partitions, wherein each copy front-end is assigned one or more of the copy back-ends, and wherein each of the copy front-ends and the one or more of the copy back-ends hardware circuitry that is assigned to the single each of the copy front-ends is to form a copy engine building block of the copy engine hardware circuitry; configuring sub-networks of a connectivity matrix, wherein each set of copy front-end and corresponding one or more copy back-ends is communicably coupled via one of the sub-networks; and responsive to one of the partitions being reconfigured, resetting the copy engine building block corresponding to the one of the partitions being reconfigured to an unassigned state by resetting the copy front-end hardware circuitry associated with the one of the partitions and draining to an idle state the subset of copy back-end hardware circuitry corresponding to the copy front-end hardware circuitry being reset. 2. The method of claim 1 , wherein the configuration information is received using a register of the graphics processor assigned to the copy engine hardware circuitry. 3. The method of claim 1 , wherein the partitions comprises at least one of hard partitions of the graphics processor, virtual machines (VMs) hosted on the graphics processor, containers hosted on the graphics processor, or a logical resource partition of the processor. 4. The method of claim 1 , wherein the copy front-end comprises copy front-end hardware circuitry to divide surface data from a source location in memory to generate a plurality of surface data sub-blocks. 5. The method of claim 4 , wherein the one or more copy back-ends comprise copy back-end hardware circuitry to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses. 6. The method of claim 1 , wherein the one or more of the copy back-ends assigned to each copy front-end is determined based on a maximum copy bandwidth supported by the partition associated with the each copy front-end. 7. A processor comprising: copy engine hardware circuitry to facilitate copying surface data in memory and comprising: a plurality of copy front-end hardware circuitry to generate a plurality of surface data sub-blocks from the surface data, wherein a number of the plurality of copy front-end hardware circuitry corresponds to a number of partitions configured for the processor, with each partition associated with a single copy front-end hardware circuitry of the plurality of copy front-end hardware circuitry; a plurality of copy back-end hardware circuitry to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses, wherein subsets of the plurality of copy back-end hardware circuitry are each associated with the single copy front-end hardware circuitry associated with each partition, and wherein each set of the single copy front-end hardware circuitry and one of the subsets of the copy back-end hardware circuitry that is assigned to the single copy front-end hardware circuitry is to form a copy engine building block of the copy engine hardware circuitry; and a connectivity matrix hardware circuitry to communicably connect the plurality of copy front-end hardware circuitry to the plurality of copy back-end hardware circuitry; wherein responsive to one of the partitions being reconfigured, the copy engine building block corresponding to the one of the partitions being reconfigured is reset to an unassigned state by resetting the copy front-end hardware circuitry associated with the one of the partitions and draining to an idle state the subset of copy back-end hardware circuitry corresponding to the copy front-end hardware circuitry being reset. 8. The processor of claim 7 , wherein the partitions comprises at least one of hard partitions of the processor, virtual machines (VMs) hosted on the processor, containers hosted on the processor, or a logical resource partition of the processor. 9. The processor of claim 7 , wherein a number of the copy back-end hardware circuitry in each of the subsets is determined based on a maximum copy bandwidth supported by the partition corresponding to the single copy front-end hardware circuitry. 10. The processor of claim 7 , wherein the connectivity matrix hardware circuitry comprises controller circuitry and a plurality of crossbar circuitry. 11. The processor of claim 7 , wherein the connectivity matrix hardware circuitry comprises a plurality of subnetworks each used to connect each of the copy front-end hardware circuitry to the subset of the copy back-end hardware circuitry that is assigned to the single copy front-end hardware circuitry. 12. The processor of claim 7 , wherein the processor comprises a graphics processing unit (GPU). 13. The processor of claim 7 , wherein the processor is at least one of a single instruction multiple data (SIMD) machine or a single instruction multiple thread (SIMT) machine. 14. A system comprising: a memory to store surface data in a source location; and copy engine hardware circuitry to facilitate copying the surface data from the source location in the memory to a destination location in the memory and comprising: a plurality of copy front-end hardware circuitry to generate a plurality of surface data sub-blocks from the surface data, wherein a number of the plurality of copy front-end hardware circuitry corresponds to a number of partitions configured for a processor, with each partition associated with a single copy front-end hardware circuitry of the plurality of copy front-end hardware circuitry; a plurality of copy back-end hardware circuitry to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses, wherein subsets of the plurality of copy back-end hardware circuitry are each associated with the single copy front-end hardware circuitry associated with each partition, and wherein each set of the single copy front-end hardware circuitry and one of the subsets of the copy back-end hardware circuitry that is assigned to the single copy front-end hardware circuitry is to form a copy engine building block of the copy engine hardware circuitry; and a connectivity matrix hardware circuitry to communicably connect the plurality of copy front-end hardware circuitry to the plurality of copy back-end hardware circuitry; wherein responsive to one of the partitions being reconfigured, the copy engine building block corresponding to the one of the partitions being reconfigured is reset to an unassigned state by resetting the copy front-end hardware circuitry associated with the one of the partitions and draining to an idle state the subset of copy back-end hardware circuitry corresponding to the copy front-end hardware circuitry being reset. 15. The system of claim 14 , wherein the partitions comprises at least one of hard partitions of the processor, virtual machines (VMs) hosted on the processor, containers hosted on the processor, or a logical resource partition of the processor. 16. The system of claim 14 , where

Assignees

Inventors

Classifications

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • Divergence aspects · CPC title

  • using bus bridges (G06F13/4022 takes precedence) · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Single storage device · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12436705B2 cover?
An apparatus to facilitate a dynamically scalable and partitioned copy engine is disclosed. The apparatus includes a processor comprising copy engine hardware circuitry to facilitate copying surface data in memory and comprising: a plurality of copy front-end hardware circuitry to generate a plurality of surface data sub-blocks, wherein a number of the plurality of copy front-end hardware circu…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F3/065. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).