Systems and methods for improved mapping of computational loops on reconfigurable architectures

US11928468B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11928468-B2
Application numberUS-202117533663-A
CountryUS
Kind codeB2
Filing dateNov 23, 2021
Priority dateNov 24, 2020
Publication dateMar 12, 2024
Grant dateMar 12, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments of a system and associated method for generating a valid mapping for a computational loop on a CGRA are disclosed herein. In particular, the method includes generating randomized schedules within particular constraints to explore greater mapping spaces than previous approaches. Further, the system and related method employs a feasibility test to test validity of each schedule such that mappings are only generated from valid schedules.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a processor in communication with a memory and a coarse-grained reconfigurable architecture (CGRA) unit, the memory including instructions, which when executed, cause the processor to: generate a data flow graph expressive of a computational loop configured for execution on the CGRA unit, wherein the data flow graph includes a plurality of nodes; determine an upper bound modulo timeslot for each node in the data flow graph, the upper bound modulo timeslot being representative of an upper scheduling bound of a CGRA schedule for each node of the plurality of nodes; and iteratively generate a random schedule for population within the CGRA schedule that schedules each node of the data flow graph to a random timeslot between a lower bound modulo timeslot and the upper bound modulo timeslot of the CGRA schedule with respect to an initiation interval value. 2. The system of claim 1 , wherein the memory further includes instructions, which, when executed, cause the processor to: map the CGRA schedule onto the CGRA unit, the CGRA schedule including the random schedule. 3. The system of claim 1 , wherein the step of generating a random schedule that schedules each node of the data flow graph to a random modulo timeslot further comprises: schedule a node indicative of a current schedule operation of the plurality of nodes of the data flow graph at a randomly selected modulo timeslot between the lower bound modulo timeslot and the upper bound modulo timeslot; consecutively displace one or more nodes that have resource conflicts with the node; update the lower bound modulo timeslot and the upper bound modulo timeslot for the node based on the current schedule operation; displace one or more previously-scheduled nodes having dependence conflicts with the node; and add one or more displaced nodes to a queue of unscheduled nodes. 4. The system of claim 1 , wherein the memory includes instructions, which when executed, further cause the processor to: evaluate a feasibility of the random schedule with respect to resource usage of the random schedule as populated within the CGRA schedule; and generate a new random schedule with the same initiation interval if the random schedule is evaluated to be infeasible. 5. The system of claim 4 , wherein the step of evaluating feasibility of the random schedule includes: estimate resource usage considering path sharing for each of a plurality of routing resources of the CGRA unit; confirm that resource overuse does not occur in each modulo timeslot of the CGRA schedule according to a Modulo Resource Table; and confirm that a total number of unique nodes including the routing nodes scheduled within a modulo timeslot of the plurality of modulo timeslots of the CGRA schedule is less than or equal to a number of processing elements of the CGRA unit scheduled within the modulo timeslot. 6. The system of claim 4 , wherein the memory includes instructions, which when executed, further cause the processor to: increase the initiation interval after λ random schedules have been generated for a current value of the initiation interval. 7. The system of claim 1 , wherein the upper bound modulo timeslot for each node of the plurality of nodes of the data flow graph is determined by identifying a Resource-Constrained As Late As Possible timeslot through a bottom-up depth-first search approach starting from nodes of the plurality of nodes that do not have any outgoing edges. 8. The system of claim 1 , wherein the memory includes instructions, which when executed, further cause the processor to: determine the lower bound modulo timeslot for each node of the plurality of nodes of the data flow graph by identifying a Resource-Constrained As Soon As Possible timeslot through a top-down, depth-first search approach starting from nodes of the plurality of nodes that do not have any incoming edges. 9. A method, comprising: generating, by a processor, a data flow graph expressive of a computational loop configured for execution on the CGRA unit, wherein the data flow graph includes a plurality of nodes; determining an upper bound modulo timeslot for each node in the data flow graph, the upper bound modulo timeslot being representative of an upper scheduling bound of a CGRA schedule for each node of the plurality of nodes; and iteratively generating a random schedule for population within the CGRA schedule that schedules each node of the data flow graph to a random timeslot between a lower bound modulo timeslot and the upper bound modulo timeslot of the CGRA schedule with respect to an initiation interval value. 10. The method of claim 9 , further comprising: mapping the CGRA schedule onto the CGRA unit, the CGRA schedule including the random schedule. 11. The method of claim 9 , wherein the step of generating a random schedule that schedules each node of the data flow graph to a random modulo timeslot further comprises: scheduling a node of the plurality of nodes of the data flow graph at a randomly selected modulo timeslot between the lower bound modulo timeslot and the upper bound modulo timeslot; consecutively displacing one or more nodes that have resource conflicts with a current schedule operation; updating the lower bound modulo timeslot and the upper bound modulo timeslot for the node based on the current schedule operation; displacing one or more previously-scheduled nodes having dependence conflicts with the current schedule operation; and adding one or more displaced nodes to a queue of unscheduled nodes. 12. The method of claim 9 , further comprising: evaluating a feasibility of the random schedule with respect to resource usage of the random schedule as populated within the CGRA schedule; and generating a new random schedule with the same initiation interval if the random schedule is evaluated to be infeasible. 13. The method of claim 12 , wherein the step of evaluating feasibility of the random schedule comprises: confirming that a total number of unique nodes including one or more routing nodes scheduled within a modulo timeslot of a plurality of modulo timeslots of the CGRA schedule is less than or equal to a number of processing elements of the CGRA unit scheduled within the modulo timeslot. 14. The method of claim 12 , further comprising: increasing the initiation interval after A random schedules have been generated for a current value of the initiation interval. 15. The method of claim 9 , wherein the upper bound modulo timeslot for each node of the plurality of nodes of the data flow graph is determined by identifying a Resource-Constrained As Late As Possible timeslot through a bottom-up depth-first search approach starting from nodes of the plurality of nodes that do not have any outgoing edges. 16. The method of claim 9 , further comprising: determining the lower bound modulo timeslot for each node of the plurality of nodes of the data flow graph by identifying a Resource-Constrained As Soon As Possible timeslot through a top-down, depth-first search approach starting from nodes of the plurality of nodes that do not have any incoming edges. 17. A method, comprising: generating a random schedule for mapping onto a coarse-grained reconfigurable architecture (CGRA) unit that schedules a node of a plurality of nodes of a data flow graph expressive of an operation of a computational loop to a random timeslot between a lower bound modulo timeslot and an upper bound modulo timeslot with respect to an initiation interval value; evaluating a feasibility of the random sche

Assignees

Inventors

Classifications

  • Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title

  • Arithmetic instructions · CPC title

  • Configuring for program initiating, e.g. using registry, configuration files · CPC title

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11928468B2 cover?
Various embodiments of a system and associated method for generating a valid mapping for a computational loop on a CGRA are disclosed herein. In particular, the method includes generating randomized schedules within particular constraints to explore greater mapping spaces than previous approaches. Further, the system and related method employs a feasibility test to test validity of each schedul…
Who is the assignee on this patent?
Balasubramanian Mahesh, Shrivastava Aviral, Univ Arizona State
What technology area does this patent fall under?
Primary CPC classification G06F9/30065. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).