DMA synchronization using alternating semaphores
US-11847507-B1 · Dec 19, 2023 · US
US11995351B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11995351-B2 |
| Application number | US-202117515976-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 1, 2021 |
| Priority date | Nov 1, 2021 |
| Publication date | May 28, 2024 |
| Grant date | May 28, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command. After transferring the first portion and the second portion of the data transfer, an indication is generated that signals completion of the data transfer requested by the DMA transfer command.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: initiating, based at least in part on a DMA transfer command, transfer of a first portion of a data transfer by a first DMA engine; and initiating, based at least in part on the DMA transfer command, transfer of a second portion of the data transfer by a second DMA engine. 2. The method of claim 1 , further comprising: receiving, by the first DMA engine, a DMA notification indicating that the DMA transfer command is stored at a DMA buffer in system memory; and fetching, by the first DMA engine, the DMA transfer command from the DMA buffer. 3. The method of claim 2 , wherein initiating transfer of the first portion of the data transfer by the first DMA engine further comprises: transmitting, by the first DMA engine, a cache probe request to a cache memory; and transferring the first portion of the data transfer based on receiving a return response indicting a cache hit in the cache memory. 4. The method of claim 2 , wherein initiating transfer of the second portion of the data transfer by the second DMA engine further comprises: transmitting, by the second DMA engine, a cache probe request to a cache memory; and transferring the second portion of the data transfer from an owner main memory based on receiving a return response indicting a cache miss in the cache memory. 5. The method of claim 1 , wherein determining the first portion of the data transfer further includes interleaving a total DMA transfer size between the first DMA engine and the second DMA engine. 6. The method of claim 1 , further comprising: receiving, at a primary DMA engine, the DMA transfer command and splitting the DMA transfer command into a plurality of smaller workloads. 7. The method of claim 6 , further comprising: receiving, from the primary DMA engine, one of the plurality of smaller workloads. 8. A processor device, comprising: a base integrated circuit (IC) die including a plurality of processing stacked die chiplets 3D stacked on top of the base IC die, wherein the base IC die includes an inter-chip data fabric communicably coupling the processing stacked die chiplets together; and a plurality of DMA engines 3D stacked on top of the base IC die, wherein the plurality of DMA engines are each configured to perform a portion of a data transfer requested by a DMA transfer command. 9. The processor device of claim 8 , wherein each of the plurality of DMA engines include a single command engine that drives multiple transfer engines. 10. The processor device of claim 8 , wherein each of the plurality of DMA engines is configured to receive a DMA notification indicating that the DMA transfer command is stored at a DMA buffer in system memory. 11. The processor device of claim 8 , wherein a first DMA engine of the plurality of DMA engines is configured to transmit a cache probe request to a cache memory communicably coupled to a first processing stacked die chiplet and transfer a first portion of the data transfer based on receiving a return response indicting a cache hit in the cache memory. 12. The processor device of claim 11 , wherein a second DMA engine of the plurality of DMA engines is configured to transmit the cache probe request to a cache memory communicably coupled to a second processing stacked die chiplet and transfer a second portion of the data transfer from an owner main memory based on receiving a return response indicting a cache miss in the cache memory. 13. The processor device of claim 8 , wherein each of the plurality of DMA engines are configured to independently determine the portion of the data transfer by interleaving a total DMA transfer size amongst the plurality of DMA engines. 14. The processor device of claim 8 , further comprising: a primary DMA engine configured to receive the DMA transfer command and split the DMA transfer command into a plurality of smaller workloads. 15. The processor device of claim 14 , wherein the primary DMA engine is further configured to submit a different workload of the plurality of smaller workloads to each of the plurality of DMA engines. 16. A system, comprising: a host processor communicably coupled to a parallel processor multi-chip module, wherein the parallel processor multi-chip module includes: a base integrated circuit (IC) die including a plurality of processing stacked die chiplets 3D stacked on top of the base IC die, wherein the base IC die includes an inter-chip data fabric communicably coupling the processing stacked die chiplets together; and a plurality of DMA engines 3D stacked on top of the base IC die, wherein the plurality of DMA engines are each configured to perform a portion of a data transfer requested by a DMA transfer command. 17. The system of claim 16 , further comprising: a primary DMA engine configured to receive the DMA transfer command and split the DMA transfer command into a plurality of smaller workloads, wherein the primary DMA engine is further configured to submit a different workload of the plurality of smaller workloads to each of the plurality of DMA engines. 18. The system of claim 16 , wherein each of the plurality of DMA engines are configured to independently determine the portion of the data transfer by interleaving a total DMA transfer size amongst the plurality of DMA engines. 19. The system of claim 16 , wherein a first DMA engine of the plurality of DMA engines is configured to transmit a cache probe request to a cache memory communicably coupled to a first processing stacked die chiplet and transfer a first portion of the data transfer based on receiving a return response indicting a cache hit in the cache memory. 20. The system of claim 19 , wherein a second DMA engine of the plurality of DMA engines is configured to transmit the cache probe request to a cache memory communicably coupled to a second processing stacked die chiplet and transfer a second portion of the data transfer from an owner main memory based on receiving a return response indicting a cache miss in the cache memory.
Performance improvement · CPC title
Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP] · CPC title
Improving I/O performance · CPC title
Burst mode · CPC title
for main memory peripheral accesses (e.g. I/O or DMA) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.