Device with data processing engine array

US10747690B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10747690-B2
Application numberUS-201815944307-A
CountryUS
Kind codeB2
Filing dateApr 3, 2018
Priority dateApr 3, 2018
Publication dateAug 18, 2020
Grant dateAug 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device may include a plurality of data processing engines. Each data processing engine may include a core and a memory module. Each core may be configured to access the memory module in the same data processing engine and a memory module within at least one other data processing engine of the plurality of data processing engines.

First claim

Opening claim text (preview).

What is claimed is: 1. A device, comprising: a plurality of data processing engines; wherein each data processing engine includes a core and a memory module; wherein each memory module includes two or more memory interfaces and each memory interface of the two or more memory interfaces is directly connected to a different core of the plurality of data processing engines; wherein each core is directly connected to a subset of the memory modules of the plurality of data processing engines via a selected memory interface within each respective memory module of the subset of the memory modules; wherein the subset of the memory modules for each core includes the memory module in the same data processing engine as the core and at least one other memory module from another data processing engine of the plurality of data processing engines; wherein the cores of the plurality of data processing engines are further serially connected by cascade connections, wherein each cascade connection connects a pair of the cores by directly connecting an accumulation register of a first core of the pair with a second core of the pair; and wherein a first core of a selected pair of the cores connected by a selected cascade connection is configured to write a first portion of data to a memory module of the subset of memory modules and to write a second portion of the data to the second core via the selected cascade connection. 2. The device of claim 1 , wherein each core is configured to read and write to the subset of the memory modules for the core as a single, contiguous memory. 3. The device of claim 1 , wherein each cascade connection is implemented using a cascade interface of the first core of the pair and a cascade interface of the second core of the pair, and wherein each cascade interface is independently enabled based on configuration data loaded into configuration registers corresponding to the respective core. 4. The device of claim 1 , wherein each of the plurality of data processing engines comprises: interconnect circuitry having a first network configured to exchange application data with other ones of the data processing engines and a second network configured to convey configuration data; wherein the application data is data stored in the memory modules of the plurality of data processing engines or operated on by the cores of the plurality of data processing engines; and wherein the configuration data is data loaded into configuration registers of the plurality of data processing engines to specify connectivity among the plurality of data processing engines. 5. The device of claim 1 , wherein each data processing engine comprises: interconnect circuitry including a stream switch configured to communicate with one or more data processing engines selected from the plurality of data processing engines. 6. The device of claim 5 , wherein the stream switch is programmable to communicate with the one or more selected data processing engines. 7. The device of claim 5 , further comprising: a subsystem; and a System-on-Chip (SoC) interface block configured to couple the plurality of data processing engines to the subsystem of the device. 8. The device of claim 7 , wherein the subsystem includes programmable logic. 9. The device of claim 7 , wherein the subsystem includes a processor configured to execute program code. 10. The device of claim 7 , wherein the subsystem includes at least one of an application-specific integrated circuit or analog/mixed signal circuitry. 11. The device of claim 7 , wherein the stream switch is coupled to the SoC interface block and configured to communicate with the subsystem of the device. 12. The device of claim 5 , wherein the interconnect circuitry of each data processing engine further comprises: a memory mapped switch configured to communicate configuration data for programming the data processing engine. 13. The device of claim 12 , wherein the memory mapped switch is further configured to communicate at least one of control data or debugging data. 14. The device of claim 4 , wherein the plurality of data processing engines are interconnected by an event broadcast network independent of the first network and the second network. 15. The device of claim 14 , further comprising: a subsystem; a System-on-Chip (SoC) interface block configured to couple the plurality of data processing engines to the subsystem of the device; and wherein the SoC interface block is configured to exchange events between the subsystem and the event broadcast network of the plurality of data processing engines. 16. A method, comprising: a first core of a first data processing engine of a plurality of data processing engines generating application data; the first core writing a first portion of the application data to a first memory module within the first data processing engine; a second core of a second data processing engine of the plurality of data processing engines reading the first portion of the application data from the first memory module; wherein the first core and the second core each is directly connected to a subset of memory modules of the plurality of data processing engines via a memory interface within each respective memory module of the subset of memory modules; wherein the subset of memory modules for each of the first core and the second core includes the first memory module; wherein the first memory module includes two or more memory interfaces with a first of the two or more memory interfaces directly connected to the first core and a second of the two or more memory interfaces directly connected to the second core; and the first core sending a second portion of the application data from an accumulation register of the first core to the second core using a cascade connection, wherein the first core and the second core are further directly and serially connected by the cascade connection. 17. The method of claim 16 , wherein the first data processing engine and the second data processing engine are neighboring data processing engines. 18. The method of claim 16 , wherein the cascade connection is implemented using a cascade interface of the first core and a cascade interface of the second core, and wherein each cascade interface is independently enabled based on configuration data loaded into configuration registers corresponding to the respective core. 19. The method of claim 16 , further comprising: the first core providing further application data to a third data processing engine of the plurality of data processing engines through a first network, wherein the first network is configured to convey application data among different ones of the plurality of data processing engines. 20. The method of claim 19 , further comprising: programming the first data processing engine to communicate with selected data processing engines of the plurality of data processing engines including the second data processing engine using a second network configured to convey configuration data.

Assignees

Inventors

Classifications

  • with reconfigurable architecture · CPC title

  • System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package · CPC title

  • Distributed shared memory [DSM], e.g. remote direct memory access [RDMA] · CPC title

  • using switching circuits, e.g. switching matrix, connection or expansion network (G06F13/4009 takes precedence) · CPC title

  • Details of memory controller · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10747690B2 cover?
A device may include a plurality of data processing engines. Each data processing engine may include a core and a memory module. Each core may be configured to access the memory module in the same data processing engine and a memory module within at least one other data processing engine of the plurality of data processing engines.
Who is the assignee on this patent?
Xilinx Inc
What technology area does this patent fall under?
Primary CPC classification G06F15/7867. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).