Execute at commit state update instructions, apparatus, methods, and systems
US-9052890-B2 · Jun 9, 2015 · US
US11037050B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11037050-B2 |
| Application number | US-201916458020-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 29, 2019 |
| Priority date | Jun 29, 2019 |
| Publication date | Jun 15, 2021 |
| Grant date | Jun 15, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for improved memory sub-system design via arbitration and the improvements to arbitration discussed herein.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a spatial array of processing elements; a plurality of cache banks each having a plurality of input queues coupled to an input to cache storage; a first plurality of memory interface circuits and a second plurality of memory interface circuits each having an input queue to store data for memory requests from the spatial array of processing elements; a first arbitrator circuit coupled to the input queues of the first plurality of memory interface circuits and to a first input queue of the plurality of input queues of each of the plurality of cache banks, wherein the first arbitrator circuit is to compare a cache bank identification value for a memory request from each of the input queues of the first plurality of memory interface circuits, and issue only one memory request for a plurality of the cache bank identification values that match; and a second arbitrator circuit coupled to the input queues of the second plurality of memory interface circuits and to a second input queue of the plurality of input queues of each of the plurality of cache banks, wherein the second arbitrator circuit is to compare a cache bank identification value for a memory request from each of the input queues of the second plurality of memory interface circuits, and issue only one memory request for a plurality of the cache bank identification values that match. 2. The apparatus of claim 1 , wherein the first arbitrator circuit is to issue only one memory request for a first plurality of the cache bank identification values that match, and concurrently issue only one memory request for a second, different plurality of the cache bank identification values that match. 3. The apparatus of claim 2 , wherein the first arbitrator circuit issues the only one memory request for the first plurality of the cache bank identification values that match according to a first arbitration policy, and concurrently issues the only one memory request for the second, different plurality of the cache bank identification values that match according to a second, different arbitration policy. 4. The apparatus of claim 3 , wherein the first arbitration policy is a round robin arbitration policy, and the second, different arbitration policy is a find first arbitration policy. 5. The apparatus of claim 1 , wherein the first arbitrator circuit comprises a plurality of comparator circuits to compare the cache bank identification value in parallel. 6. The apparatus of claim 1 , wherein the first arbitrator circuit issuing the one memory request for the plurality of the cache bank identification values that match causes a dependency token to be output for that one memory request. 7. The apparatus of claim 1 , wherein the plurality of cache banks comprises an age tracker to ensure memory requests are serviced in order arriving at the first arbitrator circuit and the second arbitrator circuit. 8. The apparatus of claim 1 , further comprising a tile manager circuit coupled to the first plurality of memory interface circuits and the second plurality of memory interface circuits, and a third arbitrator circuit to arbitrate tile manager communications between the tile manager circuit and the first plurality of memory interface circuits and the second plurality of memory interface circuits. 9. A method comprising: sending data for memory requests from a spatial array of processing elements to input queues of a first plurality of memory interface circuits and a second plurality of memory interface circuits; comparing, by a first arbitrator circuit coupled to a plurality of cache banks, a cache bank identification value for a memory request from each of the input queues of the first plurality of memory interface circuits; issuing, by the first arbitrator circuit, only one memory request to a cache bank for a plurality of the cache bank identification values that match; comparing, by a second arbitrator circuit coupled to the plurality of cache banks, a cache bank identification value for a memory request from each of the input queues of the second plurality of memory interface circuits; and issuing, by the second arbitrator circuit, only one memory request to a cache bank for a plurality of the cache bank identification values that match. 10. The method of claim 9 , wherein the issuing, by the first arbitrator circuit, comprises issuing only one memory request for a first plurality of the cache bank identification values that match, and concurrently issuing only one memory request for a second, different plurality of the cache bank identification values that match. 11. The method of claim 10 , wherein the issuing, by the first arbitrator circuit, comprises issuing the only one memory request for the first plurality of the cache bank identification values that match according to a first arbitration policy, and concurrently issuing the only one memory request for the second, different plurality of the cache bank identification values that match according to a second, different arbitration policy. 12. The method of claim 11 , wherein the first arbitration policy is a round robin arbitration policy, and the second, different arbitration policy is a find first arbitration policy. 13. The method of claim 9 , wherein the comparing, by the first arbitrator circuit, comprises performing a plurality of comparisons in parallel with a plurality of comparator circuits of the first arbitrator circuit. 14. The method of claim 9 , further comprising outputting a dependency token for that one memory request when the first arbitrator circuit issues the one memory request for the plurality of the cache bank identification values that match. 15. The method of claim 9 , further comprising ensuring, by an age tracker of the plurality of cache banks, that memory requests are serviced in order arriving at the first arbitrator circuit and the second arbitrator circuit. 16. The method of claim 9 , further comprising coupling a tile manager circuit to the first plurality of memory interface circuits and the second plurality of memory interface circuits; and arbitrating tile manager communications between the tile manager circuit and the first plurality of memory interface circuits and the second plurality of memory interface circuits with a third arbitrator circuit. 17. A non-transitory machine readable medium that stores code that when executed by a machine causes the machine to perform a method comprising: sending data for memory requests from a spatial array of processing elements to input queues of a first plurality of memory interface circuits and a second plurality of memory interface circuits; comparing, by a first arbitrator circuit coupled to a plurality of cache banks, a cache bank identification value for a memory request from each of the input queues of the first plurality of memory interface circuits; issuing, by the first arbitrator circuit, only one memory request to a cache bank for a plurality of the cache bank identification values that match; comparing, by a second arbitrator circuit coupled to the plurality of cache banks, a cache bank identification value for a memory request from each of the input queues of the second plurality of memory interface circuits; and issuing, by the second arbitrator circuit, only one memory request to a cache bank for a plurality of the cache bank identification values that match. 18. The non-transitory machine readable medium of claim 17 , wherein the issuing, by the first arbitrator circuit, comprises issuing only one memory request for a first pl
Combinations of networks · CPC title
with multidimensional access, e.g. row/column, matrix · CPC title
Buffering arrangements · CPC title
of parts of caches, e.g. directory or tag array · CPC title
with dedicated cache, e.g. instruction or stack · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.