Distributed copy engine

US10901647B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10901647-B2
Application numberUS-201916358463-A
CountryUS
Kind codeB2
Filing dateMar 19, 2019
Priority dateMar 19, 2019
Publication dateJan 26, 2021
Grant dateJan 26, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus to facilitate copying surface data is disclosed. The apparatus includes copy engine hardware to receive a command to access surface data from a source location in memory to a destination location in the memory, divide the surface data into a plurality of surface data sub-blocks, process the surface data sub-blocks to calculate virtual addresses to which accesses to the memory are to be performed and perform the memory accesses.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus to facilitate copying surface data comprising: copy engine hardware to receive an access command to access surface data from a source location in memory to copy to a destination location in the memory, including: a central copy engine to receive the access command, divide the surface data to generate a plurality of surface data sub-blocks and calculate a cacheline count for each of the plurality of surface data sub-blocks to indicate a number of cachelines in corresponding surface data sub-block; wherein the cacheline count is generated based on a size of the corresponding surface data sub-block during generation of the plurality of surface data sub-blocks; a plurality of sub-copy engines to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses, wherein each sub-copy engine comprises count hardware to maintain a pending cacheline count; and a scheduler to receive the plurality of surface data sub-blocks from the central copy engine and schedule the plurality of surface data sub-blocks for parallel processing at the plurality of sub-copy engines, wherein the scheduler selects a sub-copy engine having a lowest pending cacheline count to schedule a surface data sub-block. 2. The apparatus of claim 1 , wherein the central copy engine comprises: a sub-block generator to divide the surface data into the plurality of surface data sub-blocks; and a queue to queue the plurality of surface data sub-blocks for transmission to the plurality of sub-copy engines. 3. The apparatus of claim 2 , wherein the central copy engine further comprises a command processor to receive one or more access command packets, interpret commands included in the one or more access command packets and generate parameters to perform access operations. 4. The apparatus of claim 1 , wherein each of the plurality of sub-copy engines comprises a source sub-block walker to transmit surface data requests associated with surface data sub-blocks to a memory. 5. The apparatus of claim 4 , wherein each of the plurality of sub-copy engines comprises a destination sub-block walker to transmit surface data write requests associated with surface data sub-blocks to the memory. 6. The apparatus of claim 5 , wherein each of the plurality of sub-copy engines further comprises dependency handling logic to handle ordering of write requests dependent on out of order return of requested cacheline reads. 7. The apparatus of claim 6 , wherein each of the plurality of sub-copy engines further comprises a dependency enable bit and an identifier. 8. The apparatus of claim 7 , wherein each sub-copy engine broadcasts the identifier to the other sub-copy engines. 9. A method to facilitate copying surface data, comprising: receiving a command to access surface data from a source location in memory to a destination location in the memory; dividing the surface data into a plurality of surface data sub-blocks; calculating a cacheline count for each of the plurality of surface data sub-blocks to indicate a number of cachelines in corresponding surface data sub-block; wherein the cacheline count is calculated based on a size of the corresponding surface data sub-block during generation of the plurality of surface data sub-blocks; scheduling the plurality of surface data sub-blocks for processing at a plurality of sub-copy engines, wherein the scheduler selects a sub-copy engine having a lowest pending cacheline count to schedule a surface data sub-block; processing the plurality of surface data sub-blocks at the plurality of sub-copy engines to calculate virtual addresses to which accesses to the memory are to be performed; and performing the memory access. 10. The method of claim 9 , wherein receiving the command comprises: receiving one or more access command packets; interpreting commands included in the one or more access command packets; and generating parameters to perform access operations. 11. The method of claim 9 , further comprising scheduling the plurality of surface data sub-blocks for processing at a plurality of sub-copy engines. 12. A system to facilitate copying surface data, comprising: a memory to store surface data; and copy engine hardware to receive a access command to access surface data from a source location in memory to copy to a destination location in the memory, including: a central copy engine to receive the access command, divide the surface data to generate a plurality of surface data sub-blocks and calculate a cacheline count for each of the plurality of surface data sub-blocks to indicate a number of cachelines in corresponding surface data sub-block; wherein the cacheline count is generated based on a size of the corresponding surface data sub-block during generation of the plurality of surface data sub-blocks; a plurality of sub-copy engines to operate in parallel to process the plurality of surface data sub-blocks to perform the memory accesses, wherein each sub-copy engine comprises count hardware to maintain a pending cacheline count; and a scheduler to receive the plurality of surface data sub-blocks from the central copy engine and schedule the plurality of surface data sub-blocks for parallel processing at the plurality of sub-copy engines, wherein the scheduler selects a sub-copy engine having a lowest pending cacheline count to schedule a surface data sub-block. 13. The system of claim 12 , wherein the central copy engine comprises: a sub-block generator to divide the surface data into the plurality of surface data sub-blocks; and a queue to queue the plurality of surface data sub-blocks for transmission to the plurality of sub-copy engines. 14. The system of claim 13 , wherein the central copy engine further comprises a command processor to receive one or more access command packets, interpret commands included in the one or more access command packets and generate parameters to perform access operations. 15. The system of claim 12 , wherein each of the plurality of sub-copy engines further comprises: a source sub-block walker to transmit surface data requests associated with plurality of surface data sub-blocks to a memory; and a destination sub-block walker to transmit surface data write requests associated with plurality of surface data sub-blocks to the memory. 16. The system of claim 15 , wherein each of the plurality of sub-copy engines further comprises dependency handling logic to handle ordering of write requests dependent on out of order return of requested cacheline reads.

Assignees

Inventors

Classifications

  • Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • with multilevel cache hierarchies · CPC title

  • Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title

  • Improving or facilitating administration, e.g. storage management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10901647B2 cover?
An apparatus to facilitate copying surface data is disclosed. The apparatus includes copy engine hardware to receive a command to access surface data from a source location in memory to a destination location in the memory, divide the surface data into a plurality of surface data sub-blocks, process the surface data sub-blocks to calculate virtual addresses to which accesses to the memory are t…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).