Execute at commit state update instructions, apparatus, methods, and systems
US-9052890-B2 · Jun 9, 2015 · US
US12086080B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12086080-B2 |
| Application number | US-202017033728-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 26, 2020 |
| Priority date | Sep 26, 2020 |
| Publication date | Sep 10, 2024 |
| Grant date | Sep 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and apparatuses relating to a configurable accelerator having dataflow execution circuits are described. In one embodiment, a hardware accelerator includes a plurality of dataflow execution circuits that each comprise a register file, a plurality of execution circuits, and a graph station circuit comprising a plurality of dataflow operation entries that each include a respective ready field that indicates when an input operand for a dataflow operation is available in the register file, and the graph station circuit is to select for execution a first dataflow operation entry when its input operands are available, and clear ready fields of the input operands in the first dataflow operation entry when a result of the execution is stored in the register file; a cross dependence network coupled between the plurality of dataflow execution circuits to send data between the plurality of dataflow execution circuits according to a second dataflow operation entry; and a memory execution interface coupled between the plurality of dataflow execution circuits and a cache bank to send data between the plurality of dataflow execution circuits and the cache bank according to a third dataflow operation entry.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a memory; a hardware processor core to execute one or more instructions to offload dataflow operations, the hardware processor core coupled to the memory; and a dataflow driven accelerator, to perform the dataflow operations, coupled to the hardware processor core, wherein the dataflow driven accelerator comprises: at least one dataflow execution circuit that each comprises: a register file, a plurality of execution circuits, and a graph station circuit comprising a plurality of dataflow operation entries that each include a respective ready field that indicates when an input operand for a dataflow operation is available in the register file, and the graph station circuit is to select for execution a first dataflow operation entry when its input operands are available, and clear ready fields of the input operands in the first dataflow operation entry when a result of the execution is stored in the register file, and a memory execution interface coupled between the at least one dataflow execution circuit and the memory to send data between the at least one dataflow execution circuit and the memory according to a second dataflow operation entry. 2. The apparatus of claim 1 , wherein the at least one dataflow execution circuit comprises a plurality of dataflow execution circuits, and the graph station circuit for a producer dataflow execution circuit of the plurality of dataflow execution circuits is to execute a plurality of iterations for the first dataflow operation entry ahead of consumption by a consumer dataflow execution circuit of the plurality of dataflow execution circuits and store resultants for the plurality of iterations in the register file of the producer dataflow execution circuit. 3. The apparatus of claim 2 , wherein the graph station circuit of the producer dataflow execution circuit is to maintain a linked-list control structure for the register file that chains a secondly produced resultant for the first dataflow operation entry to a previously produced resultant for the first dataflow operation entry in the register file. 4. The apparatus of claim 3 , wherein the graph station circuit of the consumer dataflow execution circuit is to update its read pointer into the linked-list control structure of the producer dataflow execution circuit from pointing to the previously produced resultant in the register file of the producer dataflow execution circuit to pointing to the secondly produced resultant in the register file of the producer dataflow execution circuit in response to a read of the previously produced resultant in the register file of the producer dataflow execution circuit by the consumer dataflow execution circuit, and a graph station circuit of a second consumer dataflow execution circuit of the plurality of dataflow execution circuits is to update its read pointer into the linked-list control structure of the producer dataflow execution circuit from pointing to the previously produced resultant in the register file of the producer dataflow execution circuit to pointing to the secondly produced resultant in the register file of the producer dataflow execution circuit in response to a read of the previously produced resultant in the register file of the producer dataflow execution circuit by the second consumer dataflow execution circuit. 5. The apparatus of claim 1 , wherein the at least one dataflow execution circuit comprises a plurality of dataflow execution circuits, and further comprising a cross dependence network coupled between the plurality of dataflow execution circuits to send data between the plurality of dataflow execution circuits according to a third dataflow operation entry. 6. The apparatus of claim 1 , wherein the plurality of execution circuits of the at least one dataflow execution circuit comprises at least one finite state machine execution circuit that generates multiple results for each execution, and a graph station circuit of the at least one dataflow execution circuit is to select for execution the first dataflow operation entry on the at least one finite state machine execution circuit when its input operands are available. 7. The apparatus of claim 1 , wherein the first dataflow operation entry comprises a predicate field to identify a predicate that controls execution. 8. The apparatus of claim 1 , wherein the at least one dataflow execution circuit comprises a plurality of dataflow execution circuits, and execution for the first dataflow operation entry by a dataflow execution circuit of the plurality of dataflow execution circuits causes the result of the execution to be stored in a register file of the dataflow execution circuit and a register file of another dataflow execution circuit of the plurality of dataflow execution circuits by a cross dependence network coupled between the plurality of dataflow execution circuits. 9. A method comprising: loading dataflow operation entries for a dataflow graph into a dataflow driven accelerator, wherein the dataflow driven accelerator comprises: at least one dataflow execution circuit that each comprises: a register file, a plurality of execution circuits, and a graph station circuit comprising a plurality of dataflow operation entries that each include a respective ready field that indicates when an input operand for a dataflow operation is available in the register file; executing a first dataflow operation entry for the at least one dataflow execution circuit when its input operands are available to produce a result; clearing ready fields of the input operands in the first dataflow operation entry when the result is stored in a register file of the dataflow execution circuit; and sending data between the at least one dataflow execution circuit and a memory of the dataflow driven accelerator on a memory execution interface coupled between the at least one dataflow execution circuit and the memory according to a second dataflow operation entry. 10. The method of claim 9 , wherein the at least one dataflow execution circuit comprises a plurality of dataflow execution circuits, and further comprising: executing a plurality of iterations for the first dataflow operation entry by a producer dataflow execution circuit of the plurality of dataflow execution circuits is ahead of consumption by a consumer dataflow execution circuit of the plurality of dataflow execution circuits is; and storing resultants for the plurality of iterations in the register file of the producer dataflow execution circuit. 11. The method of claim 10 , further comprising maintaining a linked-list control structure by the producer dataflow execution circuit for the register file that chains a secondly produced resultant for the first dataflow operation entry to a previously produced resultant for the first dataflow operation entry in the register file. 12. The method of claim 11 , further comprising: updating a read pointer of the consumer dataflow execution circuit into the linked-list control structure of the producer dataflow execution circuit from pointing to the previously produced resultant in the register file of the producer dataflow execution circuit to pointing to the secondly produced resultant in the register file of the producer dataflow execution circuit in response to a read of the previously produced resultant in the register file of the producer dataflow execution circuit by the consumer dataflow execution circuit; and updating a read pointer of a second consumer dataflow execution circuit of the plurality of dataflow execution circuits into the linked-list control structure of the producer dataflow execution circuit from pointing
using bus bridges (G06F13/4022 takes precedence) · CPC title
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title
Details of memory controller · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.