Partitioning sparse matrices based on sparse matrix representations for crossbar-based architectures
US-2020159810-A1 · May 21, 2020 · US
US11720332B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11720332-B2 |
| Application number | US-201916527410-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 31, 2019 |
| Priority date | Apr 2, 2019 |
| Publication date | Aug 8, 2023 |
| Grant date | Aug 8, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for generating an executable program to run on one or more processor modules. The method comprises: receiving a graph comprising a plurality of data nodes, compute vertices and edges; and compiling the graph into an executable program including one or more types of multi-access instruction each of which performs at least two memory access (load and/or store) operations in a single instruction. The memory on each processor module comprises multiple memory banks whereby the same bank cannot be accessed by different load or store operations in the same instruction. The compilation comprises assigning instances of the multi-access instructions to implement at least some of the graph edges, and allocating the data to memory addresses within different ones of the banks. The allocating is performed subject to one or more constraints, including at least that different load/store operations should not access the same memory bank in the same instruction.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for generating an executable program to run on a system of one or more processor chips each comprising one or more processor modules, each processor module comprising an execution unit and memory; the method comprising: receiving a graph comprising a plurality of data nodes, a plurality of compute vertices and a plurality of directional edges, each data node representing a data element, each edge representing an input to a compute vertex from a data node or an output from a compute vertex input to a data node or another compute vertex, and each compute vertex representing one or more computations to perform on its input or inputs in order to produce the output or outputs from that compute vertex; compiling the graph into said executable program, the executable program comprising a plurality of machine code instructions, including one or more types of multi-access instruction each of which performs at least two load operations, at least two store operations, or at least one load and one store operation in a single instruction; wherein the memory on each of the processor modules comprises a respective plurality of memory banks; and the compilation comprises assigning instances of said multi-access instructions to implement at least some of said edges, analyzing the graph, and in dependence upon a result of analyzing the graph, allocating the data elements to memory addresses within different ones of the banks, wherein the allocating applies one or more constraints including at least a first constraint that no two edges can access the same memory bank at the same time; wherein allocating the data elements comprises: grouping the data elements into equivalence classes, wherein a first equivalence class includes a first set of the data elements that interfere with a second set of other ones of the data elements of a second equivalence class; generating ordered equivalence classes at least in part by ordering the equivalence classes according to a metric; and stepping through the data elements within the ordered equivalence classes to allocate each of the data elements in turn, wherein the metric comprises a first metric, the method further comprising: ordering the data elements within each equivalence class according to a second metric that is different from the first metric; and for each equivalence class, stepping through the ordered data elements within a respective equivalence class to allocate each of the data elements in turn. 2. The method of claim 1 , wherein the constraints further comprise an additional constraint on each of one or more of the edges outputting from a compute vertex, specifying that the data output by the edge should be stored with a specified alignment with respect to the memory addresses. 3. The method of claim 1 , wherein the constraints further comprise an additional constraint on each of one or more of the edges outputting from a compute vertex, specifying that the data output by the edge should be stored in a specified subset of the memory banks. 4. The method of claim 3 , wherein the specified subset comprises a region of interleaved memory. 5. The method of claim 1 , wherein at least one of the vertices comprises a loop, and the constraints further comprise an additional constraint on at least one of the edges outputting from the loop, specifying that an overspill region is left beyond an end of the memory addresses in which the data output by the edge is to be stored. 6. The method of claim 1 , wherein the multi-access instructions include at least a load-store instruction which performs a load operation and a store operation in the same instruction. 7. The method of claim 1 , wherein the multi-access instructions include at least a double-load instruction which performs two load operations in the same instruction. 8. The method of claim 1 , wherein the multi-access instructions include at least a load-load-store instruction which performs two load operations and a store operation in a single instruction. 9. The method of claim 1 , wherein the metric comprises one or more items selected from a list consisting of: a total data size of each of the equivalence classes, a number of data elements in each of the equivalence classes, a time for which each of the equivalence classes will be live, and a total number of lines of machine code for which each of the equivalence classes will be live. 10. The method of claim 1 , wherein analyzing the graph comprises determining which of the data elements will conflict with one another when the executable program is run; and wherein satisfying the first constraint includes allocating the data elements to memory such that any of the data elements that are determined by analyzing the graph to conflict with one another are allocated to different ones of the memory banks. 11. The method of claim 1 , wherein analyzing the graph comprises determining from the graph which of the data elements will be live at the same time when the executable program is run, wherein each data element is live between being written to memory and being used by the program; and wherein satisfying the first constraint includes allocating the data elements to memory such that any of the data elements that are determined by the analysis of the graph to be live at the same time when the executable program is run are allocated to different ones of the memory banks. 12. The method of claim 1 , wherein the first metric comprises variable size, and wherein the second metric comprises liveness time. 13. The method of claim 1 , wherein the first metric comprises liveness time, and wherein the second metric comprises variable size. 14. A non-transitory machine readable medium having stored thereon instructions for performing a method comprising machine executable code which when executed by at least one computer, causes the computer to: compile a graph into an executable program having machine code instructions, wherein the graph comprises a plurality of data nodes representing data elements, a plurality of compute vertices and a plurality of directional edges, further wherein the executable program has a multi-access instruction performing at least one item selected from the list consisting of: a plurality of load operations, a plurality of store operations, and a load and a store operation; wherein the machine executable code for compiling the graph causes the computer to: assign instances of the multi-access instructions to implement the edges, and analyze the graph, and in dependence upon a result of analyzing the graph, allocate the data elements to memory addresses within different memory banks, wherein allocating the data elements applies a first constraint that no two edges can access a same memory bank at a same time; wherein allocating the data elements comprises: grouping the data elements into equivalence classes, wherein a first equivalence class includes a first set of the data elements that interfere with a second set of other ones of the data elements of a second equivalence class; generating ordered equivalence classes at least in part by ordering the equivalence classes according to a metric; and stepping through the data elements within the ordered equivalence classes to allocate each of the data elements in turn, wherein the metric comprises a first metric, and wherein the machine executable code further causes the computer to: order the data elements within each equivalence class according to a second metric that is different from the first metric; and for each equivalence class, step through the ordered data elements wi
Compilation · CPC title
Data distribution · CPC title
Code distribution (considering CPU load at run-time G06F9/505; load rebalancing G06F9/5083) · CPC title
Dependency analysis; Data or control flow analysis · CPC title
Reducing the execution time required by the program code · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.