DMA engines configured to perform first portion data transfer commands with a first DMA engine and second portion data transfer commands with second DMA engine

US11995351B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11995351-B2
Application numberUS-202117515976-A
CountryUS
Kind codeB2
Filing dateNov 1, 2021
Priority dateNov 1, 2021
Publication dateMay 28, 2024
Grant dateMay 28, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command. After transferring the first portion and the second portion of the data transfer, an indication is generated that signals completion of the data transfer requested by the DMA transfer command.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: initiating, based at least in part on a DMA transfer command, transfer of a first portion of a data transfer by a first DMA engine; and initiating, based at least in part on the DMA transfer command, transfer of a second portion of the data transfer by a second DMA engine. 2. The method of claim 1 , further comprising: receiving, by the first DMA engine, a DMA notification indicating that the DMA transfer command is stored at a DMA buffer in system memory; and fetching, by the first DMA engine, the DMA transfer command from the DMA buffer. 3. The method of claim 2 , wherein initiating transfer of the first portion of the data transfer by the first DMA engine further comprises: transmitting, by the first DMA engine, a cache probe request to a cache memory; and transferring the first portion of the data transfer based on receiving a return response indicting a cache hit in the cache memory. 4. The method of claim 2 , wherein initiating transfer of the second portion of the data transfer by the second DMA engine further comprises: transmitting, by the second DMA engine, a cache probe request to a cache memory; and transferring the second portion of the data transfer from an owner main memory based on receiving a return response indicting a cache miss in the cache memory. 5. The method of claim 1 , wherein determining the first portion of the data transfer further includes interleaving a total DMA transfer size between the first DMA engine and the second DMA engine. 6. The method of claim 1 , further comprising: receiving, at a primary DMA engine, the DMA transfer command and splitting the DMA transfer command into a plurality of smaller workloads. 7. The method of claim 6 , further comprising: receiving, from the primary DMA engine, one of the plurality of smaller workloads. 8. A processor device, comprising: a base integrated circuit (IC) die including a plurality of processing stacked die chiplets 3D stacked on top of the base IC die, wherein the base IC die includes an inter-chip data fabric communicably coupling the processing stacked die chiplets together; and a plurality of DMA engines 3D stacked on top of the base IC die, wherein the plurality of DMA engines are each configured to perform a portion of a data transfer requested by a DMA transfer command. 9. The processor device of claim 8 , wherein each of the plurality of DMA engines include a single command engine that drives multiple transfer engines. 10. The processor device of claim 8 , wherein each of the plurality of DMA engines is configured to receive a DMA notification indicating that the DMA transfer command is stored at a DMA buffer in system memory. 11. The processor device of claim 8 , wherein a first DMA engine of the plurality of DMA engines is configured to transmit a cache probe request to a cache memory communicably coupled to a first processing stacked die chiplet and transfer a first portion of the data transfer based on receiving a return response indicting a cache hit in the cache memory. 12. The processor device of claim 11 , wherein a second DMA engine of the plurality of DMA engines is configured to transmit the cache probe request to a cache memory communicably coupled to a second processing stacked die chiplet and transfer a second portion of the data transfer from an owner main memory based on receiving a return response indicting a cache miss in the cache memory. 13. The processor device of claim 8 , wherein each of the plurality of DMA engines are configured to independently determine the portion of the data transfer by interleaving a total DMA transfer size amongst the plurality of DMA engines. 14. The processor device of claim 8 , further comprising: a primary DMA engine configured to receive the DMA transfer command and split the DMA transfer command into a plurality of smaller workloads. 15. The processor device of claim 14 , wherein the primary DMA engine is further configured to submit a different workload of the plurality of smaller workloads to each of the plurality of DMA engines. 16. A system, comprising: a host processor communicably coupled to a parallel processor multi-chip module, wherein the parallel processor multi-chip module includes: a base integrated circuit (IC) die including a plurality of processing stacked die chiplets 3D stacked on top of the base IC die, wherein the base IC die includes an inter-chip data fabric communicably coupling the processing stacked die chiplets together; and a plurality of DMA engines 3D stacked on top of the base IC die, wherein the plurality of DMA engines are each configured to perform a portion of a data transfer requested by a DMA transfer command. 17. The system of claim 16 , further comprising: a primary DMA engine configured to receive the DMA transfer command and split the DMA transfer command into a plurality of smaller workloads, wherein the primary DMA engine is further configured to submit a different workload of the plurality of smaller workloads to each of the plurality of DMA engines. 18. The system of claim 16 , wherein each of the plurality of DMA engines are configured to independently determine the portion of the data transfer by interleaving a total DMA transfer size amongst the plurality of DMA engines. 19. The system of claim 16 , wherein a first DMA engine of the plurality of DMA engines is configured to transmit a cache probe request to a cache memory communicably coupled to a first processing stacked die chiplet and transfer a first portion of the data transfer based on receiving a return response indicting a cache hit in the cache memory. 20. The system of claim 19 , wherein a second DMA engine of the plurality of DMA engines is configured to transmit the cache probe request to a cache memory communicably coupled to a second processing stacked die chiplet and transfer a second portion of the data transfer from an owner main memory based on receiving a return response indicting a cache miss in the cache memory.

Assignees

Inventors

Classifications

  • Performance improvement · CPC title

  • Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP] · CPC title

  • Improving I/O performance · CPC title

  • Burst mode · CPC title

  • for main memory peripheral accesses (e.g. I/O or DMA) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11995351B2 cover?
A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data trans…
Who is the assignee on this patent?
Advanced Micro Devices Inc, Ati Technologies Ulc
What technology area does this patent fall under?
Primary CPC classification G06F3/0659. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 28 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).