Device with data processing engine array
US-2019303311-A1 · Oct 3, 2019 · US
US11853235B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11853235-B2 |
| Application number | US-202217826068-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 26, 2022 |
| Priority date | Apr 3, 2018 |
| Publication date | Dec 26, 2023 |
| Grant date | Dec 26, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Examples herein describe techniques for transferring data between data processing engines in an array using shared memory. In one embodiment, certain engines in the array have connections to the memory in neighboring engines. For example, each engine may have its own assigned memory module which can be accessed directly (e.g., without using a streaming or memory mapped interconnect). In addition, the surrounding engines (referred to herein as the neighboring engines) may also include direct connections to the memory module. Using these direct connections, the cores can load and/or store data in the neighboring memory modules.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: processing data in a first data processing engine in a 2D array of data processing engines disposed in a same integrated circuit, wherein each data processing engine comprises a respective processing core and a local memory module; storing the processed data in a first local memory module in the first data processing engine; retrieving at least a portion of the processed data from the first local memory module using a first direct neighbor connection that directly couples the first local memory module in the first data processing engine to a processing core in a second data processing engine in the 2D array; and retrieving at least a portion of the processed data from the first local memory module using a second direct neighbor connection that directly couples the first local memory module in the first data processing engine to a processing core in a third data processing engine in the 2D array, wherein the second data processing engine is in a different row than the third data processing engine in the 2D array. 2. The method of claim 1 , further comprising: storing the data processed by the second data processing engine in a second local memory module in the second data processing engine; and retrieving the data in the second local memory module using a third direct neighbor connection that directly couples the second local memory module to a fourth data processing engine in the 2D array. 3. The method of claim 2 , wherein the first local memory module is shared between the first and second data processing engines and the second local memory module is shared between the second and third data processing engines but the first local memory module is not shared with the fourth data processing engine. 4. The method of claim 3 , wherein the first data processing engine neighbors the second data processing engine in the 2D array, the second data processing engine neighbors the fourth data processing engine in the 2D array, and the first data processing engine does not neighbor the fourth data processing engine in the 2D array. 5. The method of claim 1 , further comprising: storing the processed data in a second local memory module in a fourth data processing engine in the 2D array, wherein the first data processing engine is directly coupled to the second local memory module via a third direct neighbor connection. 6. The method of claim 1 , wherein each data processing engine in the 2D array includes a respective interconnect, wherein each respective processing core is directly connected to the respective local memory module in a neighboring data processing engine, wherein the interconnects in the data processing engines are communicatively coupled to provide connectivity between the data processing engines in the 2D array. 7. The method of claim 6 , further comprising: transmitting data from the first data processing engine to a fourth data processing engine in the 2D array using the respective interconnects in the first and fourth data processing engines upon determining the fourth data processing engine does not neighbor the first data processing engine in the 2D array. 8. The method of claim 6 , wherein retrieving the processed data from the first local memory module using the first direct neighbor connection avoids using the respective interconnects in the first and second data processing engines and incurs less latency relative to transmitting the processed data using the respective interconnects in the first and second data processing engines. 9. The method of claim 6 , wherein the respective core in the second data processing engine is directly coupled to the first local memory module using the first direct neighbor connection. 10. A system on a chip (SoC), comprising: a first data processing engine in an 2D array of data processing engines, wherein each data processing engine comprises a respective processing core and a local memory module, wherein the first data processing engine is configured to store processed data in a first local memory module; and a second data processing engine in the 2D array, the second data processing engine configured to: retrieve at least a portion of the processed data from the first local memory module using a first direct neighbor connection that directly couples the first local memory module in the first data processing engine to a processing core in the second data processing engine, and process the retrieved data; and a third data processing engine in the 2D array, the third data processing engine configured to: retrieve at least a portion of the processed data from the first local memory module using a second direct neighbor connection that directly couples the first local memory module in the first data processing engine to a processing core in the third data processing engine, wherein the second data processing engine is in a different row than the third data processing engine in the 2D array, and process the retrieved data. 11. The SoC of claim 10 , wherein the second data processing engine is configured to store first data in a second local memory module in the second data processing engine, the SoC further comprising: a fourth data processing engine configured to retrieve the first data from the second local memory module using a third direct neighbor connection that directly couples the second local memory module to the fourth data processing engine. 12. The SoC of claim 11 , wherein the first local memory module is shared between the first and second data processing engines and the second local memory module is shared between the second and fourth data processing engines but the first local memory module is not shared with the fourth data processing engine. 13. The SoC of claim 12 , wherein the first data processing engine neighbors the second data processing engine in the 2D array, the second data processing engine neighbors the fourth data processing engine in the 2D array, and the first data processing engine does not neighbor the fourth data processing engine in the 2D array. 14. The SoC of claim 10 , further comprising: a fourth data processing engine in the 2D array comprising a second local memory module, wherein the first data processing engine is configured to store the processed data in the second local memory module, wherein the first data processing engine is directly coupled to the second local memory module via a third direct neighbor connection. 15. The SoC of claim 10 , wherein each data processing engine in the 2D array includes a respective interconnect, wherein each respective processing core is directly connected to the respective local memory module in a neighboring data processing engine, wherein the interconnects in the data processing engines are communicatively coupled to provide connectivity between the data processing engines in the 2D array. 16. The SoC of claim 15 , further comprising: a fourth data processing engine in the 2D array, wherein the first data processing engine is configured to transmit data to the fourth data processing engine using the respective interconnects in the first and fourth data processing engines upon determining the fourth data processing engine does not neighbor the first data processing engine in the 2D array. 17. The SoC of claim 15 , wherein retrieving the processed data from the first local memory module using the first direct neighbor connection avoids using the respective interconnects in the first and second data processing engines and incurs less latency relative to transmitting the processed data using the
Access to shared memory · CPC title
Buffers; Shared memory; Pipes · CPC title
with a shared cache · CPC title
using a common memory, e.g. mailbox · CPC title
Multiple access memory array, e.g. addressing one storage element via at least two independent addressing line groups · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.