Adaptive memory address scanning based on surface format for graphics processing
US-2016321774-A1 · Nov 3, 2016 · US
US10901647B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10901647-B2 |
| Application number | US-201916358463-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 19, 2019 |
| Priority date | Mar 19, 2019 |
| Publication date | Jan 26, 2021 |
| Grant date | Jan 26, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus to facilitate copying surface data is disclosed. The apparatus includes copy engine hardware to receive a command to access surface data from a source location in memory to a destination location in the memory, divide the surface data into a plurality of surface data sub-blocks, process the surface data sub-blocks to calculate virtual addresses to which accesses to the memory are to be performed and perform the memory accesses.
Opening claim text (preview).
What is claimed is: 1. An apparatus to facilitate copying surface data comprising: copy engine hardware to receive an access command to access surface data from a source location in memory to copy to a destination location in the memory, including: a central copy engine to receive the access command, divide the surface data to generate a plurality of surface data sub-blocks and calculate a cacheline count for each of the plurality of surface data sub-blocks to indicate a number of cachelines in corresponding surface data sub-block; wherein the cacheline count is generated based on a size of the corresponding surface data sub-block during generation of the plurality of surface data sub-blocks; a plurality of sub-copy engines to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses, wherein each sub-copy engine comprises count hardware to maintain a pending cacheline count; and a scheduler to receive the plurality of surface data sub-blocks from the central copy engine and schedule the plurality of surface data sub-blocks for parallel processing at the plurality of sub-copy engines, wherein the scheduler selects a sub-copy engine having a lowest pending cacheline count to schedule a surface data sub-block. 2. The apparatus of claim 1 , wherein the central copy engine comprises: a sub-block generator to divide the surface data into the plurality of surface data sub-blocks; and a queue to queue the plurality of surface data sub-blocks for transmission to the plurality of sub-copy engines. 3. The apparatus of claim 2 , wherein the central copy engine further comprises a command processor to receive one or more access command packets, interpret commands included in the one or more access command packets and generate parameters to perform access operations. 4. The apparatus of claim 1 , wherein each of the plurality of sub-copy engines comprises a source sub-block walker to transmit surface data requests associated with surface data sub-blocks to a memory. 5. The apparatus of claim 4 , wherein each of the plurality of sub-copy engines comprises a destination sub-block walker to transmit surface data write requests associated with surface data sub-blocks to the memory. 6. The apparatus of claim 5 , wherein each of the plurality of sub-copy engines further comprises dependency handling logic to handle ordering of write requests dependent on out of order return of requested cacheline reads. 7. The apparatus of claim 6 , wherein each of the plurality of sub-copy engines further comprises a dependency enable bit and an identifier. 8. The apparatus of claim 7 , wherein each sub-copy engine broadcasts the identifier to the other sub-copy engines. 9. A method to facilitate copying surface data, comprising: receiving a command to access surface data from a source location in memory to a destination location in the memory; dividing the surface data into a plurality of surface data sub-blocks; calculating a cacheline count for each of the plurality of surface data sub-blocks to indicate a number of cachelines in corresponding surface data sub-block; wherein the cacheline count is calculated based on a size of the corresponding surface data sub-block during generation of the plurality of surface data sub-blocks; scheduling the plurality of surface data sub-blocks for processing at a plurality of sub-copy engines, wherein the scheduler selects a sub-copy engine having a lowest pending cacheline count to schedule a surface data sub-block; processing the plurality of surface data sub-blocks at the plurality of sub-copy engines to calculate virtual addresses to which accesses to the memory are to be performed; and performing the memory access. 10. The method of claim 9 , wherein receiving the command comprises: receiving one or more access command packets; interpreting commands included in the one or more access command packets; and generating parameters to perform access operations. 11. The method of claim 9 , further comprising scheduling the plurality of surface data sub-blocks for processing at a plurality of sub-copy engines. 12. A system to facilitate copying surface data, comprising: a memory to store surface data; and copy engine hardware to receive a access command to access surface data from a source location in memory to copy to a destination location in the memory, including: a central copy engine to receive the access command, divide the surface data to generate a plurality of surface data sub-blocks and calculate a cacheline count for each of the plurality of surface data sub-blocks to indicate a number of cachelines in corresponding surface data sub-block; wherein the cacheline count is generated based on a size of the corresponding surface data sub-block during generation of the plurality of surface data sub-blocks; a plurality of sub-copy engines to operate in parallel to process the plurality of surface data sub-blocks to perform the memory accesses, wherein each sub-copy engine comprises count hardware to maintain a pending cacheline count; and a scheduler to receive the plurality of surface data sub-blocks from the central copy engine and schedule the plurality of surface data sub-blocks for parallel processing at the plurality of sub-copy engines, wherein the scheduler selects a sub-copy engine having a lowest pending cacheline count to schedule a surface data sub-block. 13. The system of claim 12 , wherein the central copy engine comprises: a sub-block generator to divide the surface data into the plurality of surface data sub-blocks; and a queue to queue the plurality of surface data sub-blocks for transmission to the plurality of sub-copy engines. 14. The system of claim 13 , wherein the central copy engine further comprises a command processor to receive one or more access command packets, interpret commands included in the one or more access command packets and generate parameters to perform access operations. 15. The system of claim 12 , wherein each of the plurality of sub-copy engines further comprises: a source sub-block walker to transmit surface data requests associated with plurality of surface data sub-blocks to a memory; and a destination sub-block walker to transmit surface data write requests associated with plurality of surface data sub-blocks to the memory. 16. The system of claim 15 , wherein each of the plurality of sub-copy engines further comprises dependency handling logic to handle ordering of write requests dependent on out of order return of requested cacheline reads.
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title
with dedicated cache, e.g. instruction or stack · CPC title
with multilevel cache hierarchies · CPC title
Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title
Improving or facilitating administration, e.g. storage management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.