Apparatuses, methods, and systems for a configurable accelerator having dataflow execution circuits

US12086080B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12086080-B2
Application numberUS-202017033728-A
CountryUS
Kind codeB2
Filing dateSep 26, 2020
Priority dateSep 26, 2020
Publication dateSep 10, 2024
Grant dateSep 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and apparatuses relating to a configurable accelerator having dataflow execution circuits are described. In one embodiment, a hardware accelerator includes a plurality of dataflow execution circuits that each comprise a register file, a plurality of execution circuits, and a graph station circuit comprising a plurality of dataflow operation entries that each include a respective ready field that indicates when an input operand for a dataflow operation is available in the register file, and the graph station circuit is to select for execution a first dataflow operation entry when its input operands are available, and clear ready fields of the input operands in the first dataflow operation entry when a result of the execution is stored in the register file; a cross dependence network coupled between the plurality of dataflow execution circuits to send data between the plurality of dataflow execution circuits according to a second dataflow operation entry; and a memory execution interface coupled between the plurality of dataflow execution circuits and a cache bank to send data between the plurality of dataflow execution circuits and the cache bank according to a third dataflow operation entry.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a memory; a hardware processor core to execute one or more instructions to offload dataflow operations, the hardware processor core coupled to the memory; and a dataflow driven accelerator, to perform the dataflow operations, coupled to the hardware processor core, wherein the dataflow driven accelerator comprises: at least one dataflow execution circuit that each comprises: a register file, a plurality of execution circuits, and a graph station circuit comprising a plurality of dataflow operation entries that each include a respective ready field that indicates when an input operand for a dataflow operation is available in the register file, and the graph station circuit is to select for execution a first dataflow operation entry when its input operands are available, and clear ready fields of the input operands in the first dataflow operation entry when a result of the execution is stored in the register file, and a memory execution interface coupled between the at least one dataflow execution circuit and the memory to send data between the at least one dataflow execution circuit and the memory according to a second dataflow operation entry. 2. The apparatus of claim 1 , wherein the at least one dataflow execution circuit comprises a plurality of dataflow execution circuits, and the graph station circuit for a producer dataflow execution circuit of the plurality of dataflow execution circuits is to execute a plurality of iterations for the first dataflow operation entry ahead of consumption by a consumer dataflow execution circuit of the plurality of dataflow execution circuits and store resultants for the plurality of iterations in the register file of the producer dataflow execution circuit. 3. The apparatus of claim 2 , wherein the graph station circuit of the producer dataflow execution circuit is to maintain a linked-list control structure for the register file that chains a secondly produced resultant for the first dataflow operation entry to a previously produced resultant for the first dataflow operation entry in the register file. 4. The apparatus of claim 3 , wherein the graph station circuit of the consumer dataflow execution circuit is to update its read pointer into the linked-list control structure of the producer dataflow execution circuit from pointing to the previously produced resultant in the register file of the producer dataflow execution circuit to pointing to the secondly produced resultant in the register file of the producer dataflow execution circuit in response to a read of the previously produced resultant in the register file of the producer dataflow execution circuit by the consumer dataflow execution circuit, and a graph station circuit of a second consumer dataflow execution circuit of the plurality of dataflow execution circuits is to update its read pointer into the linked-list control structure of the producer dataflow execution circuit from pointing to the previously produced resultant in the register file of the producer dataflow execution circuit to pointing to the secondly produced resultant in the register file of the producer dataflow execution circuit in response to a read of the previously produced resultant in the register file of the producer dataflow execution circuit by the second consumer dataflow execution circuit. 5. The apparatus of claim 1 , wherein the at least one dataflow execution circuit comprises a plurality of dataflow execution circuits, and further comprising a cross dependence network coupled between the plurality of dataflow execution circuits to send data between the plurality of dataflow execution circuits according to a third dataflow operation entry. 6. The apparatus of claim 1 , wherein the plurality of execution circuits of the at least one dataflow execution circuit comprises at least one finite state machine execution circuit that generates multiple results for each execution, and a graph station circuit of the at least one dataflow execution circuit is to select for execution the first dataflow operation entry on the at least one finite state machine execution circuit when its input operands are available. 7. The apparatus of claim 1 , wherein the first dataflow operation entry comprises a predicate field to identify a predicate that controls execution. 8. The apparatus of claim 1 , wherein the at least one dataflow execution circuit comprises a plurality of dataflow execution circuits, and execution for the first dataflow operation entry by a dataflow execution circuit of the plurality of dataflow execution circuits causes the result of the execution to be stored in a register file of the dataflow execution circuit and a register file of another dataflow execution circuit of the plurality of dataflow execution circuits by a cross dependence network coupled between the plurality of dataflow execution circuits. 9. A method comprising: loading dataflow operation entries for a dataflow graph into a dataflow driven accelerator, wherein the dataflow driven accelerator comprises: at least one dataflow execution circuit that each comprises: a register file, a plurality of execution circuits, and a graph station circuit comprising a plurality of dataflow operation entries that each include a respective ready field that indicates when an input operand for a dataflow operation is available in the register file; executing a first dataflow operation entry for the at least one dataflow execution circuit when its input operands are available to produce a result; clearing ready fields of the input operands in the first dataflow operation entry when the result is stored in a register file of the dataflow execution circuit; and sending data between the at least one dataflow execution circuit and a memory of the dataflow driven accelerator on a memory execution interface coupled between the at least one dataflow execution circuit and the memory according to a second dataflow operation entry. 10. The method of claim 9 , wherein the at least one dataflow execution circuit comprises a plurality of dataflow execution circuits, and further comprising: executing a plurality of iterations for the first dataflow operation entry by a producer dataflow execution circuit of the plurality of dataflow execution circuits is ahead of consumption by a consumer dataflow execution circuit of the plurality of dataflow execution circuits is; and storing resultants for the plurality of iterations in the register file of the producer dataflow execution circuit. 11. The method of claim 10 , further comprising maintaining a linked-list control structure by the producer dataflow execution circuit for the register file that chains a secondly produced resultant for the first dataflow operation entry to a previously produced resultant for the first dataflow operation entry in the register file. 12. The method of claim 11 , further comprising: updating a read pointer of the consumer dataflow execution circuit into the linked-list control structure of the producer dataflow execution circuit from pointing to the previously produced resultant in the register file of the producer dataflow execution circuit to pointing to the secondly produced resultant in the register file of the producer dataflow execution circuit in response to a read of the previously produced resultant in the register file of the producer dataflow execution circuit by the consumer dataflow execution circuit; and updating a read pointer of a second consumer dataflow execution circuit of the plurality of dataflow execution circuits into the linked-list control structure of the producer dataflow execution circuit from pointing

Assignees

Inventors

Classifications

  • using bus bridges (G06F13/4022 takes precedence) · CPC title

  • Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

  • Details of memory controller · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12086080B2 cover?
Systems, methods, and apparatuses relating to a configurable accelerator having dataflow execution circuits are described. In one embodiment, a hardware accelerator includes a plurality of dataflow execution circuits that each comprise a register file, a plurality of execution circuits, and a graph station circuit comprising a plurality of dataflow operation entries that each include a respecti…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F13/1668. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).