Anti-Congestion Flow Control for Reconfigurable Processors
US-2021373867-A1 · Dec 2, 2021 · US
US12169459B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12169459-B2 |
| Application number | US-202318099021-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 19, 2023 |
| Priority date | Jan 19, 2023 |
| Publication date | Dec 17, 2024 |
| Grant date | Dec 17, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A heterogeneous processing system and method including a host processor, a first processor coupled to a first memory, a second processor coupled to a second memory, and switch and bus circuitry that communicatively couples the host processor, the first processor, and the second processor. The host processor is programmed to map virtual addresses of the second memory to physical addresses of the switch and bus circuitry and to configure the first processor to directly access the second memory using the mapped physical addresses according to memory extension operation. The first processor may be a reconfigurable processor, a reconfigurable dataflow unit, or a compute engine. The first processor may directly read data from or directly write data to the second memory while executing an application. The method may include configuring the first processor to directly access the second memory while executing an application for reading or writing data.
Opening claim text (preview).
What is claimed is: 1. A heterogeneous processing system, comprising: a host processor; a first processor coupled to a first memory, wherein the first processor comprises a reconfigurable processor that includes: an array of coarse-grained reconfigurable units comprising, an address generation unit, a plurality of memory units, and a plurality of compute units interconnected by an array-level network; a top-level network coupled to the address generation unit of the array of coarse-grained reconfigurable units; and an interface coupled between the top-level network and an external port of the first processor; a second processor coupled to a second memory; and switch and bus circuitry that communicatively couples the host processor, the external port of the first processor, and the second processor; wherein the host processor is programmed to configure the address generation unit of the array of coarse-grained reconfigurable unit in the first processor to map virtual addresses of the second memory to physical addresses of the switch and bus circuitry so that the first processor can directly access the second memory using the mapped physical addresses according to memory extension operation. 2. The heterogeneous processing system of claim 1 , wherein the second processor is programmed to execute a first part of an application to generate and store first data into the second memory, and wherein the first processor is configured to directly access the first data from the second memory using the mapped physical addresses while executing a second part of the application using the first data. 3. The heterogeneous processing system of claim 2 , wherein the first processor is further configured to store second data output from executing the second part of the application into the first memory. 4. The heterogeneous processing system of claim 3 , further comprising: a host memory coupled to the host processor; and a data transfer resource coupled to the first processor and communicatively coupled by the switch and bus circuitry; wherein the host processor is configured to program the data transfer resource to transfer data between the second data and the host memory; and wherein the host processor is configured to prompt the data transfer resource to transfer the second data from the first memory to the host memory. 5. The heterogeneous processing system of claim 2 , further comprising: a host memory coupled to the host processor; and wherein the first processor is further configured to directly access the host memory and to store second data output from executing the second part of the application directly into the host memory. 6. The heterogeneous processing system of claim 1 , wherein the first processor is configured to execute a first part of an application to generate first data and to directly write the first data into the second memory using the mapped physical addresses. 7. The heterogeneous processing system of claim 6 , wherein the second processor is programmed to execute a second part of the application using the first data to generate second data and to store the second data into the second memory. 8. The heterogeneous processing system of claim 1 , further comprising: a host memory coupled to the host processor; and wherein the first processor is further configured to directly read first data from the host memory while executing an application using the first data to generate second data and to directly write the second data into the second memory while executing the application. 9. The heterogeneous processing system of claim 1 , wherein: the first processor is programmed to execute at least a portion of a first node a dataflow graph implementing a machine learning algorithm; and the second processor is programmed to execute at least a portion of a second node the dataflow graph. 10. The heterogeneous processing system of claim 9 , wherein: the second processor is further programmed to generate and store first data into the second memory; and the first processor is further programmed to directly access the first data from the second memory using mapped physical addresses. 11. The heterogeneous processing system of claim 9 , further comprising a host memory coupled to the host processor, wherein the first processor is further programmed to: directly read first data from the host memory; use using the first data to generate second data; and directly write the second data into the second memory using mapped physical addresses. 12. A method of accessing data in a heterogeneous system to implement a machine learning system using a dataflow graph having a plurality of nodes connected by edges, wherein the heterogeneous system includes a host processor, a first processor coupled to a first memory, a second processor coupled to a second memory, and switch and bus circuitry that communicatively couples the host processor, the first processor, and the second processor, the method comprising: executing at least a portion of a first node of the plurality of nodes of the dataflow graph using the first processor; executing at least a portion of a second node of the plurality of nodes of the dataflow graph using the second processor; mapping, by the host processor, virtual addresses of the second memory to physical addresses of the switch and bus circuitry; configuring, by the host processor, the first processor to directly access the second memory using the mapped physical addresses according to memory extension operation; and directly accessing, by the first processor, the second memory through the switch and bus circuitry. 13. The method of claim 12 , wherein the configuring the first processor comprises configuring a reconfigurable dataflow unit. 14. The method of claim 12 , wherein the configuring the first processor comprises configuring a compute engine. 15. The method of claim 12 , wherein the first processor comprises a reconfigurable processor that includes: an array of coarse-grained reconfigurable units comprising, an address generation unit, a plurality of memory units, and a plurality of compute units interconnected by an array-level network; a top-level network coupled to the address generation unit of the array of coarse-grained reconfigurable units; and an interface coupled between the top-level network and the switch and bus circuitry; and the configuring the first processor comprises configuring the address generation unit of the array of coarse-grained reconfigurable unit in the reconfigurable processor to map virtual addresses of the second memory to physical addresses of the switch and bus circuitry. 16. The method of claim 12 , further comprising: generating and storing, by the second processor, while executing the portion of the second node, first data into the second memory; and directly accessing, by the first processor, the first data from the second memory using mapped physical addresses while executing the portion of the first node. 17. The method of claim 16 , further comprising configuring the first processor to write second data generated by the portion of the first node. 18. The method of claim 16 , the heterogeneous system including a host memory coupled to the host processor, further comprising configuring the first processor to directly access the host memory and to write second data output from executing the portion of the second node directly into the host memory. 19. The method of claim 12 , further comprising generating, by the first processor while executing
Address translation · CPC title
Performance improvement · CPC title
Distributed shared memory [DSM], e.g. remote direct memory access [RDMA] · CPC title
Address space sharing · CPC title
Virtual address space management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.