Communicating between data processing engines using shared memory

US11853235B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11853235-B2
Application numberUS-202217826068-A
CountryUS
Kind codeB2
Filing dateMay 26, 2022
Priority dateApr 3, 2018
Publication dateDec 26, 2023
Grant dateDec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples herein describe techniques for transferring data between data processing engines in an array using shared memory. In one embodiment, certain engines in the array have connections to the memory in neighboring engines. For example, each engine may have its own assigned memory module which can be accessed directly (e.g., without using a streaming or memory mapped interconnect). In addition, the surrounding engines (referred to herein as the neighboring engines) may also include direct connections to the memory module. Using these direct connections, the cores can load and/or store data in the neighboring memory modules.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: processing data in a first data processing engine in a 2D array of data processing engines disposed in a same integrated circuit, wherein each data processing engine comprises a respective processing core and a local memory module; storing the processed data in a first local memory module in the first data processing engine; retrieving at least a portion of the processed data from the first local memory module using a first direct neighbor connection that directly couples the first local memory module in the first data processing engine to a processing core in a second data processing engine in the 2D array; and retrieving at least a portion of the processed data from the first local memory module using a second direct neighbor connection that directly couples the first local memory module in the first data processing engine to a processing core in a third data processing engine in the 2D array, wherein the second data processing engine is in a different row than the third data processing engine in the 2D array. 2. The method of claim 1 , further comprising: storing the data processed by the second data processing engine in a second local memory module in the second data processing engine; and retrieving the data in the second local memory module using a third direct neighbor connection that directly couples the second local memory module to a fourth data processing engine in the 2D array. 3. The method of claim 2 , wherein the first local memory module is shared between the first and second data processing engines and the second local memory module is shared between the second and third data processing engines but the first local memory module is not shared with the fourth data processing engine. 4. The method of claim 3 , wherein the first data processing engine neighbors the second data processing engine in the 2D array, the second data processing engine neighbors the fourth data processing engine in the 2D array, and the first data processing engine does not neighbor the fourth data processing engine in the 2D array. 5. The method of claim 1 , further comprising: storing the processed data in a second local memory module in a fourth data processing engine in the 2D array, wherein the first data processing engine is directly coupled to the second local memory module via a third direct neighbor connection. 6. The method of claim 1 , wherein each data processing engine in the 2D array includes a respective interconnect, wherein each respective processing core is directly connected to the respective local memory module in a neighboring data processing engine, wherein the interconnects in the data processing engines are communicatively coupled to provide connectivity between the data processing engines in the 2D array. 7. The method of claim 6 , further comprising: transmitting data from the first data processing engine to a fourth data processing engine in the 2D array using the respective interconnects in the first and fourth data processing engines upon determining the fourth data processing engine does not neighbor the first data processing engine in the 2D array. 8. The method of claim 6 , wherein retrieving the processed data from the first local memory module using the first direct neighbor connection avoids using the respective interconnects in the first and second data processing engines and incurs less latency relative to transmitting the processed data using the respective interconnects in the first and second data processing engines. 9. The method of claim 6 , wherein the respective core in the second data processing engine is directly coupled to the first local memory module using the first direct neighbor connection. 10. A system on a chip (SoC), comprising: a first data processing engine in an 2D array of data processing engines, wherein each data processing engine comprises a respective processing core and a local memory module, wherein the first data processing engine is configured to store processed data in a first local memory module; and a second data processing engine in the 2D array, the second data processing engine configured to: retrieve at least a portion of the processed data from the first local memory module using a first direct neighbor connection that directly couples the first local memory module in the first data processing engine to a processing core in the second data processing engine, and process the retrieved data; and a third data processing engine in the 2D array, the third data processing engine configured to: retrieve at least a portion of the processed data from the first local memory module using a second direct neighbor connection that directly couples the first local memory module in the first data processing engine to a processing core in the third data processing engine, wherein the second data processing engine is in a different row than the third data processing engine in the 2D array, and process the retrieved data. 11. The SoC of claim 10 , wherein the second data processing engine is configured to store first data in a second local memory module in the second data processing engine, the SoC further comprising: a fourth data processing engine configured to retrieve the first data from the second local memory module using a third direct neighbor connection that directly couples the second local memory module to the fourth data processing engine. 12. The SoC of claim 11 , wherein the first local memory module is shared between the first and second data processing engines and the second local memory module is shared between the second and fourth data processing engines but the first local memory module is not shared with the fourth data processing engine. 13. The SoC of claim 12 , wherein the first data processing engine neighbors the second data processing engine in the 2D array, the second data processing engine neighbors the fourth data processing engine in the 2D array, and the first data processing engine does not neighbor the fourth data processing engine in the 2D array. 14. The SoC of claim 10 , further comprising: a fourth data processing engine in the 2D array comprising a second local memory module, wherein the first data processing engine is configured to store the processed data in the second local memory module, wherein the first data processing engine is directly coupled to the second local memory module via a third direct neighbor connection. 15. The SoC of claim 10 , wherein each data processing engine in the 2D array includes a respective interconnect, wherein each respective processing core is directly connected to the respective local memory module in a neighboring data processing engine, wherein the interconnects in the data processing engines are communicatively coupled to provide connectivity between the data processing engines in the 2D array. 16. The SoC of claim 15 , further comprising: a fourth data processing engine in the 2D array, wherein the first data processing engine is configured to transmit data to the fourth data processing engine using the respective interconnects in the first and fourth data processing engines upon determining the fourth data processing engine does not neighbor the first data processing engine in the 2D array. 17. The SoC of claim 15 , wherein retrieving the processed data from the first local memory module using the first direct neighbor connection avoids using the respective interconnects in the first and second data processing engines and incurs less latency relative to transmitting the processed data using the

Assignees

Inventors

Classifications

  • Access to shared memory · CPC title

  • Buffers; Shared memory; Pipes · CPC title

  • with a shared cache · CPC title

  • using a common memory, e.g. mailbox · CPC title

  • Multiple access memory array, e.g. addressing one storage element via at least two independent addressing line groups · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11853235B2 cover?
Examples herein describe techniques for transferring data between data processing engines in an array using shared memory. In one embodiment, certain engines in the array have connections to the memory in neighboring engines. For example, each engine may have its own assigned memory module which can be accessed directly (e.g., without using a streaming or memory mapped interconnect). In additio…
Who is the assignee on this patent?
Xilinx Inc
What technology area does this patent fall under?
Primary CPC classification G06F13/1663. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).