Apparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator

US11037050B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11037050-B2
Application numberUS-201916458020-A
CountryUS
Kind codeB2
Filing dateJun 29, 2019
Priority dateJun 29, 2019
Publication dateJun 15, 2021
Grant dateJun 15, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for improved memory sub-system design via arbitration and the improvements to arbitration discussed herein.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a spatial array of processing elements; a plurality of cache banks each having a plurality of input queues coupled to an input to cache storage; a first plurality of memory interface circuits and a second plurality of memory interface circuits each having an input queue to store data for memory requests from the spatial array of processing elements; a first arbitrator circuit coupled to the input queues of the first plurality of memory interface circuits and to a first input queue of the plurality of input queues of each of the plurality of cache banks, wherein the first arbitrator circuit is to compare a cache bank identification value for a memory request from each of the input queues of the first plurality of memory interface circuits, and issue only one memory request for a plurality of the cache bank identification values that match; and a second arbitrator circuit coupled to the input queues of the second plurality of memory interface circuits and to a second input queue of the plurality of input queues of each of the plurality of cache banks, wherein the second arbitrator circuit is to compare a cache bank identification value for a memory request from each of the input queues of the second plurality of memory interface circuits, and issue only one memory request for a plurality of the cache bank identification values that match. 2. The apparatus of claim 1 , wherein the first arbitrator circuit is to issue only one memory request for a first plurality of the cache bank identification values that match, and concurrently issue only one memory request for a second, different plurality of the cache bank identification values that match. 3. The apparatus of claim 2 , wherein the first arbitrator circuit issues the only one memory request for the first plurality of the cache bank identification values that match according to a first arbitration policy, and concurrently issues the only one memory request for the second, different plurality of the cache bank identification values that match according to a second, different arbitration policy. 4. The apparatus of claim 3 , wherein the first arbitration policy is a round robin arbitration policy, and the second, different arbitration policy is a find first arbitration policy. 5. The apparatus of claim 1 , wherein the first arbitrator circuit comprises a plurality of comparator circuits to compare the cache bank identification value in parallel. 6. The apparatus of claim 1 , wherein the first arbitrator circuit issuing the one memory request for the plurality of the cache bank identification values that match causes a dependency token to be output for that one memory request. 7. The apparatus of claim 1 , wherein the plurality of cache banks comprises an age tracker to ensure memory requests are serviced in order arriving at the first arbitrator circuit and the second arbitrator circuit. 8. The apparatus of claim 1 , further comprising a tile manager circuit coupled to the first plurality of memory interface circuits and the second plurality of memory interface circuits, and a third arbitrator circuit to arbitrate tile manager communications between the tile manager circuit and the first plurality of memory interface circuits and the second plurality of memory interface circuits. 9. A method comprising: sending data for memory requests from a spatial array of processing elements to input queues of a first plurality of memory interface circuits and a second plurality of memory interface circuits; comparing, by a first arbitrator circuit coupled to a plurality of cache banks, a cache bank identification value for a memory request from each of the input queues of the first plurality of memory interface circuits; issuing, by the first arbitrator circuit, only one memory request to a cache bank for a plurality of the cache bank identification values that match; comparing, by a second arbitrator circuit coupled to the plurality of cache banks, a cache bank identification value for a memory request from each of the input queues of the second plurality of memory interface circuits; and issuing, by the second arbitrator circuit, only one memory request to a cache bank for a plurality of the cache bank identification values that match. 10. The method of claim 9 , wherein the issuing, by the first arbitrator circuit, comprises issuing only one memory request for a first plurality of the cache bank identification values that match, and concurrently issuing only one memory request for a second, different plurality of the cache bank identification values that match. 11. The method of claim 10 , wherein the issuing, by the first arbitrator circuit, comprises issuing the only one memory request for the first plurality of the cache bank identification values that match according to a first arbitration policy, and concurrently issuing the only one memory request for the second, different plurality of the cache bank identification values that match according to a second, different arbitration policy. 12. The method of claim 11 , wherein the first arbitration policy is a round robin arbitration policy, and the second, different arbitration policy is a find first arbitration policy. 13. The method of claim 9 , wherein the comparing, by the first arbitrator circuit, comprises performing a plurality of comparisons in parallel with a plurality of comparator circuits of the first arbitrator circuit. 14. The method of claim 9 , further comprising outputting a dependency token for that one memory request when the first arbitrator circuit issues the one memory request for the plurality of the cache bank identification values that match. 15. The method of claim 9 , further comprising ensuring, by an age tracker of the plurality of cache banks, that memory requests are serviced in order arriving at the first arbitrator circuit and the second arbitrator circuit. 16. The method of claim 9 , further comprising coupling a tile manager circuit to the first plurality of memory interface circuits and the second plurality of memory interface circuits; and arbitrating tile manager communications between the tile manager circuit and the first plurality of memory interface circuits and the second plurality of memory interface circuits with a third arbitrator circuit. 17. A non-transitory machine readable medium that stores code that when executed by a machine causes the machine to perform a method comprising: sending data for memory requests from a spatial array of processing elements to input queues of a first plurality of memory interface circuits and a second plurality of memory interface circuits; comparing, by a first arbitrator circuit coupled to a plurality of cache banks, a cache bank identification value for a memory request from each of the input queues of the first plurality of memory interface circuits; issuing, by the first arbitrator circuit, only one memory request to a cache bank for a plurality of the cache bank identification values that match; comparing, by a second arbitrator circuit coupled to the plurality of cache banks, a cache bank identification value for a memory request from each of the input queues of the second plurality of memory interface circuits; and issuing, by the second arbitrator circuit, only one memory request to a cache bank for a plurality of the cache bank identification values that match. 18. The non-transitory machine readable medium of claim 17 , wherein the issuing, by the first arbitrator circuit, comprises issuing only one memory request for a first pl

Assignees

Inventors

Classifications

  • G06N3/045Primary

    Combinations of networks · CPC title

  • with multidimensional access, e.g. row/column, matrix · CPC title

  • Buffering arrangements · CPC title

  • of parts of caches, e.g. directory or tag array · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11037050B2 cover?
Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing el…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).