Data transfer between accessible memories of multiple processors incorporated in coarse-grained reconfigurable (CGR) architecture within heterogeneous processing system using one memory to memory transfer operation

US12210468B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12210468-B2
Application numberUS-202318099014-A
CountryUS
Kind codeB2
Filing dateJan 19, 2023
Priority dateJan 19, 2023
Publication dateJan 28, 2025
Grant dateJan 28, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A heterogeneous processing system including a host processor, a first processor with a first memory and a first data transfer resource, a second processor with a second memory, and switch and bus circuitry that communicatively couples the processors and the data transfer resource. The host processor is programmed to map virtual addresses of the second memory to physical addresses of the switch and bus circuitry and to configure the first processor to perform one memory to memory transfer operation between the first and second memories using the data transfer resource. The first processor may be configured to program the first data transfer resource. A method including mapping virtual addresses of the second memory to physical addresses of the switch and bus circuitry, and configuring the first processor to perform one memory to memory transfer operation between the first and second memories using the first data transfer resource.

First claim

Opening claim text (preview).

What is claimed is: 1. A heterogeneous processing system programmed to execute a computation graph for machine learning and/or inference, the system comprising: a host processor; a first processor, having a coarse-grained reconfigurable architecture, coupled to a first memory and configured to execute a first node of the computation graph to generate and store first data into the first memory; a second processor coupled to a second memory and configured to execute a second node of the computation graph using the first data and store second data into the second memory; a first data transfer resource, incorporated into the first processor, and a second data transfer resource; and switch and bus circuitry that communicatively couples the host processor, the first processor, the second processor, the first data transfer resource, and the second data transfer resource; wherein the host processor is configured to map virtual addresses of the second memory to physical addresses of the switch and bus circuitry, to configure the first data transfer resource to transfer the first data from the first memory into the second memory using the mapped physical addresses, and to configure the second data transfer resource to transfer the second data from the second memory into host memory of the host processor. 2. The heterogeneous processing system of claim 1 , wherein the first processor with the coarse-grained reconfigurable architecture comprises a reconfigurable dataflow unit including: an array of configurable units, including a plurality of configurable compute units, a plurality of configurable memory units, and a plurality of address generation units, coupled together by an array-level network; a top-level network coupled to the plurality of address generation units of the array of configurable units; and the first data transfer resource. 3. The heterogeneous processing system of claim 1 , wherein the second processor comprises a compute engine incorporating the second data transfer resource. 4. The heterogeneous processing system of claim 1 , wherein the first data transfer resource comprises a direct memory access (DMA) engine. 5. The heterogeneous processing system of claim 1 , wherein the first data transfer resource is configured by the host processor to directly transfer the first data from the first memory into the second memory using the mapped physical addresses. 6. A method of transferring data in a heterogeneous system programmed to execute a computation graph for machine learning and/or inference, wherein the heterogeneous system includes a host processor coupled to a host memory, a first processor having a coarse-grained reconfigurable architecture coupled to a first memory, a second processor coupled to a second memory, a first data transfer resource incorporated into the first processor, and switch and bus circuitry that communicatively couples the host processor, the first processor, the second processor, and the first data transfer resource, the method comprising: mapping, by the host processor, virtual addresses of the second memory to physical addresses of the switch and bus circuitry; executing, by the first processor, a first node of the computation graph to generate and store first data into the first memory; configuring, by the host processor, the first data transfer resource to transfer the first data from the first memory into the second memory using the mapped physical addresses; prompting the first data transfer resource to transfer the first data from the first memory into the second memory; and executing, by the second processor, a second node of the computation graph using the first data to generate and store second data into the second memory. 7. The method of claim 6 , wherein the first processor comprises a reconfigurable dataflow unit including: an array of configurable units, including a plurality of configurable compute units, a plurality of configurable memory units, and a plurality of address generation units, coupled together by an array-level network; a top-level network coupled to the plurality of address generation units of the array of configurable units; and the first data transfer resource. 8. The method of claim 6 , wherein the configuring comprises configuring the first data transfer resource to directly transfer the first data from the first memory into the second memory using the mapped physical addresses. 9. The method of claim 6 , wherein the heterogeneous system further includes a second data transfer resource coupled to the switch and bus circuitry, the method further comprising: configuring, by the host processor, the second data transfer resource to transfer the second data from the second memory into host memory using the mapped physical addresses; and prompting the second data transfer resource to transfer the second data from the second memory into the host memory. 10. A method of transferring data in a heterogeneous system programmed to execute a computation graph for machine learning and/or inference, wherein the heterogeneous system includes a host processor coupled to a host memory, a first processor having a coarse-grained reconfigurable architecture coupled to a first memory, a second processor coupled to a second memory, a first data transfer resource incorporated into the first processor, and switch and bus circuitry that communicatively couples the host processor, the first processor, the second processor, and the first data transfer resource, the method comprising: mapping, by the host processor, virtual addresses of the second memory to physical addresses of the switch and bus circuitry; executing, by the second processor, a first node of the computation graph to generate and store first data into the second memory; configuring, by the host processor, the first data transfer resource to transfer the first data from the second memory into the first memory using the mapped physical addresses; prompting the first data transfer resource to transfer the first data from the second memory into the first memory; and executing, by the first processor, a second node of the computation graph using the first data to generate and store second data into the first memory. 11. The method of claim 10 , wherein the first processor comprises a reconfigurable dataflow unit including: an array of configurable units, including a plurality of configurable compute units, a plurality of configurable memory units, and a plurality of address generation units, coupled together by an array-level network; a top-level network coupled to the plurality of address generation units of the array of configurable units; and the first data transfer resource. 12. The method of claim 10 , the method further comprising: configuring, by the host processor, the first data transfer resource to transfer the second data from the first memory into host memory; and prompting the first data transfer resource to transfer the second data from the first memory into the host memory. 13. The method of claim 10 , wherein the configuring comprises configuring the first data transfer resource to directly transfer the first data from the second memory into the first memory using the mapped physical addresses.

Assignees

Inventors

Classifications

  • with request queuing · CPC title

  • Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory · CPC title

  • Distributed shared memory [DSM], e.g. remote direct memory access [RDMA] · CPC title

  • Latency reduction · CPC title

  • Correctness of operation, e.g. memory ordering · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12210468B2 cover?
A heterogeneous processing system including a host processor, a first processor with a first memory and a first data transfer resource, a second processor with a second memory, and switch and bus circuitry that communicatively couples the processors and the data transfer resource. The host processor is programmed to map virtual addresses of the second memory to physical addresses of the switch …
Who is the assignee on this patent?
Sambanova Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F13/28. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 28 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).