Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator

US2020310994A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020310994-A1
Application numberUS-201916370928-A
CountryUS
Kind codeA1
Filing dateMar 30, 2019
Priority dateMar 30, 2019
Publication dateOct 1, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.

First claim

Opening claim text (preview).

1 . An apparatus comprising: a spatial array of processing elements; a cache; a first memory interface circuit comprising a first port into the cache, a first plurality of input queues to store data for memory requests from the spatial array of processing elements, and a first memory operation register; a second memory interface circuit comprising a second port into the cache, a second plurality of input queues to store data for memory requests from the spatial array of processing elements, and a second memory operation register; and an allocator circuit to: set respective first values into the first memory operation register and the second memory operation register according to a first allocation mode to couple the first port to a first input queue of the first plurality of input queues that stores data for memory requests from a first processing element of the spatial array of processing elements, couple the second port to a first input queue of the second plurality of input queues that stores data for memory requests from a second processing element of the spatial array of processing elements, and couple the first port to a second input queue of the first plurality of input queues that stores data for memory requests from a third processing element of the spatial array of processing elements, and set respective second values into the first memory operation register and the second memory operation register according to a second allocation mode to couple the first port to the first input queue of the first plurality of input queues that stores data for memory requests from the first processing element of the spatial array of processing elements, couple the second port to the first input queue of the second plurality of input queues that stores data for memory requests from the second processing element of the spatial array of processing elements, and couple the second port to a second input queue of the second plurality of input queues that stores data for memory requests from the third processing element of the spatial array of processing elements. 2 . The apparatus of claim 1 , wherein the respective first values set in the first memory operation register and the second memory operation register causes a first completion buffer of the first memory interface circuit to receive a completion indication from the cache for memory requests from the first processing element, a first completion buffer of the second memory interface circuit to receive a completion indication from the cache for memory requests from the second processing element, and a second completion buffer of the first memory interface circuit to receive a completion indication from the cache for memory requests from the third processing element. 3 . The apparatus of claim 2 , wherein the first completion buffer of the first memory interface circuit is a first proper subset of slots of a unified completion buffer of the first memory interface circuit, the second completion buffer of the first memory interface circuit is a second proper subset of slots of the unified completion buffer of the first memory interface circuit, and the allocator circuit assigns a largest number of buffer slots of the unified completion buffer to the one of the first processing element or the third processing element that issues a largest number of memory requests for a dataflow graph. 4 . The apparatus of claim 2 , wherein the first completion buffer of the first memory interface circuit is a first proper subset of slots of a unified completion buffer of the first memory interface circuit, the second completion buffer of the first memory interface circuit is a second proper subset of slots of the unified completion buffer of the first memory interface circuit, and the allocator circuit assigns a largest number of buffer slots of the unified completion buffer to the one of the first processing element or the third processing element that has a longest latency for memory requests for a dataflow graph. 5 . The apparatus of claim 1 , wherein the second allocation mode allocates input queues based on issuance by the first processing element of a largest number of memory requests for a dataflow graph, the second processing element of a next largest number of memory requests for the dataflow graph, and the third processing element of a smaller number of memory requests for the dataflow graph than the next largest number of memory requests. 6 . The apparatus of claim 1 , wherein the allocator circuit allocates a next input queue of the first memory interface circuit or the second memory interface circuit in program order to the one of the first memory interface circuit or the second memory interface circuit with a fewest number of memory requests assigned to its input queues for a dataflow graph. 7 . The apparatus of claim 1 , wherein the allocator circuit switches from the first allocation mode to the second allocation mode in runtime for a dataflow graph. 8 . The apparatus of claim 1 , wherein the first memory interface circuit, when in the first allocation mode, sends a first backpressure value to stall the first processing element from issuing an additional memory request when the first input queue of the first memory interface circuit is not available for data for the additional memory request, the second memory interface circuit, when in the first allocation mode, sends a second backpressure value to stall the second processing element from issuing an additional memory request when the first input queue of the second memory interface circuit is not available for data for the additional memory request, and the first memory interface circuit, when in the first allocation mode, sends a third backpres sure value to stall the third processing element from issuing an additional memory request when the second input queue of the first memory interface circuit is not available for data for the additional memory request. 9 . A method comprising: coupling a spatial array of processing elements to a first memory interface circuit comprising a first port into a cache, a first plurality of input queues to store data for memory requests from the spatial array of processing elements, and a first memory operation register, and to a second memory interface circuit comprising a second port into the cache, a second plurality of input queues to store data for memory requests from the spatial array of processing elements, and a second memory operation register; setting respective first values into the first memory operation register and the second memory operation register according to a first allocation mode to couple the first port to a first input queue of the first plurality of input queues that stores data for memory requests from a first processing element of the spatial array of processing elements, couple the second port to a first input queue of the second plurality of input queues that stores data for memory requests from a second processing element of the spatial array of processing elements, and couple the first port to a second input queue of the first plurality of input queues that stores data for memory requests from a third processing element of the spatial array of processing elements; and setting respective second values into the first memory operation register and the second memory operation register according to a second allocation mode to couple the first port to the first input queue of the first plurality of input queues that stores data for memory requests from the first processing element of the spatial array of processing elements, couple the second port to the first input queue of the second plurality of input queues that stores data for memory requests from the second processing element of the spatial ar

Assignees

Inventors

Classifications

  • Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

  • G06F13/102Primary

    where the program performs an interfacing function, e.g. device driver (G06F13/105 takes precedence; contention policies within device drivers G06F9/4881; scheduling within device drivers G06F9/52) · CPC title

  • for access to input/output bus · CPC title

  • single instruction multiple data [SIMD] multiprocessors · CPC title

  • Latency reduction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020310994A1 cover?
Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circ…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F13/102. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).