Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator

US10417175B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10417175-B2
Application numberUS-201715859466-A
CountryUS
Kind codeB2
Filing dateDec 30, 2017
Priority dateDec 30, 2017
Publication dateSep 17, 2019
Grant dateSep 17, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatuses relating to consistency in an accelerator are described. In one embodiment, request address file (RAF) circuits are coupled to a spatial array by a first network, a memory is coupled to the RAF circuits by a second network, a RAF circuit is to not issue, into the second network, a request to the memory marked with a program order dependency on a previous request until receiving a first token generated by completion of the previous request to the memory by another RAF circuit, and a second RAF circuit is to not issue, into the second network, a second request to the memory marked with a program order dependency on a first request until receiving a second token sent by a first RAF circuit when a predetermined time period has lapsed since the first request was issued by the first RAF circuit into the second network.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a spatial array of processing elements coupled by a first communications network; a plurality of request address file circuits coupled to the spatial array of processing elements by the first communications network; and a memory coupled to the plurality of request address file circuits by a second communications network, wherein: a request address file circuit of the plurality of request address file circuits is to not issue, into the second communications network, an access request to the memory marked with a program order dependency on a previous, in program order, access request until receiving a first memory dependency token generated by completion of access of the previous, in program order, access request to the memory by another of the plurality of request address file circuits, and a single, request address file circuit of the plurality of request address file circuits is to, for a first access request to the memory received by the single, request address file circuit and a second access request to the memory marked with a program order dependency on the first access request received by the single, request address file circuit, provide a second memory dependency token for the program order dependency on issuance of the first access request into the second communications network, and then issue the second access request into the second communications network based on reading the second memory dependency token. 2. The apparatus of claim 1 , wherein the single, request address file circuit is to generate and store the second memory dependency token in a same cycle of the single, request address file circuit, and issue the second access request into the second communications network on a next cycle. 3. The apparatus of claim 1 , wherein the first access request and the second access request are to a same location in the memory. 4. The apparatus of claim 1 , wherein a second request address file circuit of the plurality of request address file circuits is to not issue to the memory a fourth access request marked with a program order dependency on a third access request until receiving, from the second communications network, a third memory dependency token generated by completion of access of the third access request to the memory by a first request address file circuit of the plurality of request address file circuits. 5. The apparatus of claim 1 , wherein a second request address file circuit of the plurality of request address file circuits is to not issue, into the second communications network, a second access request to the memory marked with a program order dependency on a first access request until receiving a third memory dependency token sent by a first request address file circuit of the plurality of request address file circuits when a predetermined time period has lapsed since the first access request to the memory was issued by the first request address file circuit into the second communications network. 6. The apparatus of claim 5 , further comprising a memory management unit coupled to the memory to maintain the memory according to a coherency protocol, wherein a fourth request address file circuit of the plurality of request address file circuits is to not issue to the memory a fourth access request marked with a program order dependency on a third access request until receiving, from the second communications network, a fourth memory dependency token generated by: completion of access of the third access request to the memory from a third request address file circuit of the plurality of request address file circuits, and the access of the third access request being made globally visible by the memory management unit according to the coherency protocol. 7. A non-transitory machine readable medium that stores code that when executed by a machine causes the machine to perform a method comprising: coupling a plurality of request address file circuits to a spatial array of processing elements by a first communications network, and a memory to the plurality of request address file circuits by a second communications network; receiving, by a single, request address file circuit of the plurality of request address file circuits from the spatial array of processing elements, a first access request to the memory and a second access request to the memory marked with a program order dependency on the first access request; stalling issuance of the second access request into the second communications network by the single, request address file circuit; providing a first memory dependency token for the program order dependency on issuance of the first access request into the second communications network by the single, request address file circuit; and issuing the second access request into the second communications network based on reading the first memory dependency token by the single, request address file circuit. 8. The non-transitory machine readable medium of claim 7 , wherein the providing and the issuing comprise the single, request address file generating and storing the first memory dependency token in a same cycle of the single, request address file circuit, and issuing the second access request into the second communications network on a next cycle. 9. The non-transitory machine readable medium of claim 7 , wherein the first access request and the second access request are to a same location in the memory. 10. The non-transitory machine readable medium of claim 7 , wherein the method further comprises a second request address file circuit of the plurality of request address file circuits not issuing to the memory a fourth access request marked with a program order dependency on a third access request until receiving, from the second communications network, a second memory dependency token generated by completion of access of the third access request to the memory by a first request address file circuit of the plurality of request address file circuits. 11. The non-transitory machine readable medium of claim 7 , wherein the method further comprises a second request address file circuit of the plurality of request address file circuits not issuing, into the second communications network, a second access request to the memory marked with a program order dependency on a first access request until receiving a second memory dependency token sent by a first request address file circuit of the plurality of request address file circuits when a predetermined time period has lapsed since the first access request to the memory was issued by the first request address file circuit into the second communications network. 12. The non-transitory machine readable medium of claim 11 , wherein the method further comprises coupling a memory management unit to the memory to maintain the memory according to a coherency protocol, wherein a fourth request address file circuit of the plurality of request address file circuits is not issuing to the memory a fourth access request marked with a program order dependency on a third access request until receiving, from the second communications network, a third memory dependency token generated by: completion of access of the third access request to the memory from a third request address file circuit of the plurality of request address file circuits, and the access of the third access request being made globally visible by the memory management unit according to the coherency protocol. 13. An apparatus comprising: a spatial array of processing elements coupled by a first communications network; a plurality of request address file circuits coupled to the spatial array o

Assignees

Inventors

Classifications

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Prefetch instructions; cache control instructions · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • G06F15/173Primary

    using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake · CPC title

  • Distributed shared memory [DSM], e.g. remote direct memory access [RDMA] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10417175B2 cover?
Methods and apparatuses relating to consistency in an accelerator are described. In one embodiment, request address file (RAF) circuits are coupled to a spatial array by a first network, a memory is coupled to the RAF circuits by a second network, a RAF circuit is to not issue, into the second network, a request to the memory marked with a program order dependency on a previous request until re…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/173. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 17 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).