Runtime virtualization of reconfigurable data flow resources
US-11809908-B2 · Nov 7, 2023 · US
US12461889B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12461889-B2 |
| Application number | US-202318243994-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 8, 2023 |
| Priority date | Apr 10, 2023 |
| Publication date | Nov 4, 2025 |
| Grant date | Nov 4, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data processing system including an array of reconfigurable units and a compiler configured to generate to execute a dataflow graph of a user application is disclosed. The dataflow graph includes a sequence of temporal partitions, each temporal partition including a sequence of graph control operations. Also disclosed is an intelligent graph orchestration and execution engine (IGOEE) configured to receive an optimization objective from the complier. The optimization objective can be for minimizing execution time of the reconfigurable processor or maximizing computing resource utilization of the reconfigurable processor. The IGOEE can reorganize the sequence of temporal partitions and the sequence of graph control operations within each temporal partition to satisfy the optimization objective; and execute the reorganized dataflow graph on the reconfigurable processor. A corresponding method is also disclosed herein.
Opening claim text (preview).
The invention claimed is: 1 . A system comprising: a processor comprising an array of reconfigurable units, configured to execute a dataflow graph of a user application from a compiler, wherein the dataflow graph includes a sequence of temporal partitions, and wherein each temporal partition includes a sequence of graph control operations; an intelligent graph orchestration and execution engine (IGOEE) configured to receive at least one optimization objective from the complier, wherein the at least one optimization objective specifies at least one of: minimizing an execution time of the reconfigurable processor and maximizing a computing resource utilization of the reconfigurable processor; reorganize the sequence of temporal partitions and the sequence of graph control operations within each temporal partition to satisfy the at least one optimization objective; generate by a finite state machine (FSM), a plurality of hardware states, wherein each hardware state is coupled to unroll a single graph control operation or a plurality of graph control operations to a runtime; and execute the reorganized dataflow graph on the reconfigurable processor. 2 . The system of claim 1 , wherein the sequence of graph control operations includes two or more of the following: loading a configuration file; loading an argument file; loading an address translation file; and executing the configuration file. 3 . The system of claim 1 , wherein the IGOEE is configured to reorganize the sequence of graph control operations by combining a subset of graph control operations in the sequence of graph control operations into a single operation. 4 . The system of claim 1 , wherein the IGOEE is configured to reorganize the sequence of temporal partitions by pipelining consecutive temporal partitions. 5 . The system of claim 1 , wherein the IGOEE is configured to execute the reorganized dataflow graph on the reconfigurable processor by allocating a subset of reconfigurable processing units within the reconfigurable processor to the reorganized dataflow graph; and loading the reorganized dataflow graph into the allocated subset of reconfigurable processing units. 6 . The system of claim 1 , wherein each graph control operation includes a software (SW) operation having a SW setup latency equal to a time required for iterating & updating through the array of reconfigurable units to start a HW operation. 7 . The system of claim 6 , wherein minimizing for execution time of the reconfigurable processor includes reorganizing the sequence of graph control operations to have a minimum possible SW setup latency. 8 . The system of claim 1 , wherein each graph control operation includes a HW operation having a HW execution latency equal to an execution time including a time required to push operation-related data to or pull operation-related data from a memory and a total time required by the processor to start and complete the HW operation. 9 . The system of claim 8 , wherein minimizing for execution time of the reconfigurable processor includes reorganizing the sequence of graph control operations to have a minimum possible HW execution latency. 10 . A method of managing executing a dataflow graph of a user application on a reconfigurable processor comprising an array of reconfigurable units, the method comprising: receiving a dataflow graph of a user application from a complier, wherein the dataflow graph includes a sequence of temporal partitions, and wherein each temporal partition includes a sequence of graph control operations; receiving at least one optimization objective from the complier, wherein the at least one optimization objective specifies at least one of: minimizing an execution time of the reconfigurable processor and maximizing a computing resource utilization of the reconfigurable processor; reorganizing the sequence of temporal partitions and the sequence of graph control operations within each temporal partition to satisfy the at least one optimization objective; generating by a finite state machine (FSM), a plurality of hardware states and unrolling by each hardware state, a single graph control operation or a plurality of graph control operations to a runtime; and executing the reorganized dataflow graph on the reconfigurable processor. 11 . The method of claim 10 , wherein the sequence of graph control operations includes two or more of the following: loading a configuration file; loading an argument file; loading an address translation file; and executing the configuration file. 12 . The method of claim 10 , wherein reorganizing the sequence of graph control operations includes combining a subset of graph control operations in the sequence of graph control operations into a single operation. 13 . The method of claim 10 , wherein reorganizing the sequence of temporal partitions includes pipelining consecutive temporal partitions. 14 . The method of claim 10 , wherein executing the reorganized dataflow graph on the reconfigurable processor further comprises: allocating a subset of reconfigurable processing units within the reconfigurable processor to the reorganized dataflow graph; and loading the reorganized dataflow graph into the allocated subset of reconfigurable processing units. 15 . The method of claim 10 , wherein each graph control operation includes a software (SW) operation having a SW setup latency equal to a time required for iterating & updating through the array of reconfigurable units to start a HW operation. 16 . The method of claim 15 , wherein minimizing for execution time of the reconfigurable processor includes reorganizing the sequence of graph control operations to have a minimum possible SW setup latency. 17 . The method of claim 10 , wherein each graph control operation includes a HW operation having a HW execution latency equal to an execution time including a time required to push operation-related data to or pull operation-related data from a memory and a total time required by the processor to start and complete the HW operation. 18 . The method of claim 17 , wherein minimizing for execution time of the reconfigurable processor includes reorganizing the sequence of graph control operations to have a minimum possible HW execution latency. 19 . A non-transitory computer readable medium having instructions encoded thereon for a data processing system comprising a coarse-grained reconfigurable (CGR) processor including an array of CGR unit reconfigurable units, the instructions configured to cause the processor to conduct a method comprising: receiving a dataflow graph of a user application from a complier, wherein the dataflow graph includes a sequence of temporal partitions, and wherein each temporal partition includes a sequence of graph control operations; receiving at least one optimization objective from the complier, wherein the at least one optimization objective specifies at least one of: minimizing an execution time of the reconfigurable processor and maximizing a computing resource utilization of the reconfigurable processor; reorganizing the sequence of temporal partitions and the sequence of graph control operations within each temporal partition to satisfy the at least one optimization objective; generating by a finite state machine (FSM), a plurality of hardware states and unrolling by each hardware state, a single graph control operation or a plurality of graph control operations to a runtime; and executing the reorganized dataflow graph on the reco
by tracing the execution of the program · CPC title
Dataflow computers · CPC title
Prevention of errors by analysis, debugging or testing of software · CPC title
comprising an array of processing units with common control, e.g. single instruction multiple data processors (G06F15/82 takes precedence {; for correlation function computation G06F17/15}) · CPC title
Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.