Apparatus, methods, and systems with a configurable spatial accelerator

US10445250B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10445250-B2
Application numberUS-201715859454-A
CountryUS
Kind codeB2
Filing dateDec 30, 2017
Priority dateDec 30, 2017
Publication dateOct 15, 2019
Grant dateOct 15, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a spatial array of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the spatial array of processing elements with each node represented as a dataflow operator in the spatial array of processing elements, and the spatial array of processing elements is to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators; a plurality of request address file circuits coupled to the spatial array of processing elements and a cache memory, each request address file circuit of the plurality of request address file circuits to access data in the cache memory in response to a request for data access from the spatial array of processing elements; a plurality of translation lookaside buffers comprising a translation lookaside buffer in each of the plurality of request address file circuits to provide an output of a physical address for an input of a virtual address; and a translation lookaside buffer manager circuit comprising a higher-level translation lookaside buffer than the plurality of translation lookaside buffers, the translation lookaside buffer manager circuit to perform a first page walk in the cache memory for a miss of an input of a virtual address into a first translation lookaside buffer and into the higher-level translation lookaside buffer to determine a physical address mapped to the virtual address, and store a mapping of the virtual address to the physical address from the first page walk in the higher-level translation lookaside buffer to cause the higher-level translation lookaside buffer to send the physical address to the first translation lookaside buffer in a first request address file circuit. 2. The apparatus of claim 1 , wherein the translation lookaside buffer manager circuit is to: concurrently, with the first page walk, perform a second page walk in the cache memory, wherein the second page walk is for a miss of an input of a virtual address into a second translation lookaside buffer and into the higher-level translation lookaside buffer to determine a physical address mapped to the virtual address, and store a mapping of the virtual address to the physical address from the second page walk in the higher-level translation lookaside buffer to cause the higher-level translation lookaside buffer to send the physical address to the second translation lookaside buffer in a second request address file circuit. 3. The apparatus of claim 1 , wherein receipt of the physical address in the first translation lookaside buffer is to cause the first request address file circuit to perform a data access for the request for data access from the spatial array of processing elements on the physical address in the cache memory. 4. The apparatus of claim 1 , wherein the translation lookaside buffer manager circuit is to insert an indicator in the higher-level translation lookaside buffer for the miss of the input of the virtual address in the first translation lookaside buffer and the higher-level translation lookaside buffer to prevent an additional page walk for the input of the virtual address during the first page walk. 5. The apparatus of claim 1 , wherein the translation lookaside buffer manager circuit is to receive a shootdown message from a requesting entity for a mapping of a physical address to a virtual address, invalidate the mapping in the higher-level translation lookaside buffer, and send shootdown messages to only those of the plurality of request address file circuits that include a copy of the mapping in a respective translation lookaside buffer, wherein each of those of the plurality of request address file circuits are to send an acknowledgement message to the translation lookaside buffer manager circuit, and the translation lookaside buffer manager circuit is to send a shootdown completion acknowledgment message to the requesting entity when all acknowledgement messages are received. 6. The apparatus of claim 1 , wherein the translation lookaside buffer manager circuit is to receive a shootdown message from a requesting entity for a mapping of a physical address to a virtual address, invalidate the mapping in the higher-level translation lookaside buffer, and send shootdown messages to all of the plurality of request address file circuits, wherein each of the plurality of request address file circuits are to send an acknowledgement message to the translation lookaside buffer manager circuit, and the translation lookaside buffer manager circuit is to send a shootdown completion acknowledgment message to the requesting entity when all acknowledgement messages are received. 7. A method comprising: overlaying an input of a dataflow graph comprising a plurality of nodes into a spatial array of processing elements with each node represented as a dataflow operator in the spatial array of processing elements; coupling a plurality of request address file circuits to the spatial array of processing elements and a cache memory with each request address file circuit of the plurality of request address file circuits accessing data in the cache memory in response to a request for data access from the spatial array of processing elements; providing an output of a physical address for an input of a virtual address into a translation lookaside buffer of a plurality of translation lookaside buffers comprising a translation lookaside buffer in each of the plurality of request address file circuits; coupling a translation lookaside buffer manager circuit comprising a higher-level translation lookaside buffer than the plurality of translation lookaside buffers to the plurality of request address file circuits and the cache memory; and performing a first page walk in the cache memory for a miss of an input of a virtual address into a first translation lookaside buffer and into the higher-level translation lookaside buffer with the translation lookaside buffer manager circuit to determine a physical address mapped to the virtual address, and store a mapping of the virtual address to the physical address from the first page walk in the higher-level translation lookaside buffer to cause the higher-level translation lookaside buffer to send the physical address to the first translation lookaside buffer in a first request address file circuit. 8. The method of claim 7 , further comprising: concurrently, with the first page walk, performing a second page walk in the cache memory with the translation lookaside buffer manager circuit, wherein the second page walk is for a miss of an input of a virtual address into a second translation lookaside buffer and into the higher-level translation lookaside buffer to determine a physical address mapped to the virtual address, and storing a mapping of the virtual address to the physical address from the second page walk in the higher-level translation lookaside buffer to cause the higher-level translation lookaside buffer to send the physical address to the second translation lookaside buffer in a second request address file circuit. 9. The method of claim 7 , further comprising causing the first request address file circuit to perform a data access for the request for data access from the spatial array of processing elements on the physical address in the cache memory in response to receipt of the physical address in the first translation lookaside buffer. 10. The method of claim 7 , further comprising inserting, with the translation lookaside buffer manager circuit, an indicator in the higher-level translation lookaside buffer for the miss of the input of the virtual address in the first translation lookaside

Assignees

Inventors

Classifications

  • Details relating to cache mapping · CPC title

  • Power efficiency · CPC title

  • the data cache being concurrently physically addressed · CPC title

  • Mapping of cache memory to specific storage devices or parts thereof · CPC title

  • Non-volatile memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10445250B2 cover?
Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F12/1054. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 15 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).