Reconfigurable neural network processing based on subgraph recognition

US11782706B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11782706-B1
Application numberUS-202117361992-A
CountryUS
Kind codeB1
Filing dateJun 29, 2021
Priority dateJun 29, 2021
Publication dateOct 10, 2023
Grant dateOct 10, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving input codes of a neural network, the neural network comprising neural network operators; compiling the input codes to generate an input data set representing a first dataflow graph of the neural network, wherein the first dataflow graph includes nodes connected by edges, each node comprising a neural network operator of the neural network operators, each edge between two nodes indicating a data dependency between two neural network operators represented by the two nodes; traversing the first dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the first dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise neural network operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor having a same number of data elements as the single input tensor; determining merged operators for the SESE subgraphs; generating a second dataflow graph based on replacing the SESE subgraphs in the first dataflow graph with the merged operators, wherein each merged operator has the single input tensor of the replaced SESE subgraph as an input and the single output tensor of the replaced SESE subgraph as an output; generating executable instructions of the second dataflow graph to be executed by a neural network hardware accelerator, wherein the executable instructions comprise first executable instructions for each of the merged operators targeted at a first execution unit, and second executable instructions for each of the neural network operators targeted at a second execution unit of the neural network accelerator; generating a schedule of execution of the executable instructions by the neural network hardware accelerator based on data dependencies among the neural network operators and the merged operators; and generating a program based on the executable instructions and the schedule of execution. 2. The method of claim 1 , wherein the merged operator is determined from a SESE-subgraph-to-merged-operator mapping table that maps merged operators to topologies of SESE subgraphs; and wherein each topology lists the element-wise neural network operators included in the SESE subgraph and edge connectivity among the nodes representing the element-wise neural network operators. 3. The method of claim 1 , wherein one of the first executable instructions includes: a first opcode that refers to a first mapping table of a first piecewise polynomial function of merged operator mapping tables of the neural network hardware accelerator, wherein the first piecewise polynomial function approximates a first relationship between an input and an output of a first merged operator; and a first operand as the input to the first merged operator. 4. The method of claim 3 , wherein one of the second executable instructions includes: a second opcode that refers to a set of arithmetic operations to be performed by a hardware computation engine of the neural network hardware accelerator; and a second operand as input to the second execution unit. 5. The method of claim 3 , further comprising: providing the program to the neural network hardware accelerator for execution; at a first time: storing a first mapping between the first opcode and a first SESE-subgraph-to-merged-operator mapping table of the neural network hardware accelerator; and executing the one of the first executable instructions based on the first mapping; and at a second time: storing a second mapping between the first opcode and a second SESE-subgraph-to-merged-operator mapping table of the neural network hardware accelerator; and executing the one of the first executable instructions based on the second mapping. 6. The method of claim 5 , wherein the first execution unit includes mappings representing piecewise polynomial functions; and wherein each piecewise polynomial function of a piecewise polynomial approximates a merged operator of the merged operators. 7. The method of claim 6 , wherein the first execution unit includes merged operator mapping tables, each merged operator mapping table comprising programmable registers that store candidate output values and a multiplexor to select one of the candidate output values based on an input value to implement a mapping representing a respective piecewise polynomial function; and wherein each of the first executable instructions includes a first opcode that identifies the first executable instruction, and a parameter that references one of the mapping tables. 8. The method of claim 7 , wherein the hardware accelerator further comprises a programmable opcode mapping memory that stores mapping between opcodes and the mapping tables. 9. The method of claim 8 , wherein the programmable opcode mapping memory maps the first opcode to a first mapping table representing a first merged operator, and a second opcode to a second mapping table representing a second merged operator; wherein the first merged operator represents a first number of operators; and wherein the second merged operator represents a second number of operators. 10. The method of claim 9 , further comprising: at a first time: storing a first mapping between the first opcode and the first mapping table in the opcode mapping memory; and causing the first execution unit to execute one of the first executable instructions including the first opcode based on the first mapping; and at a second time: storing a second mapping between the first opcode and the second mapping table in the opcode mapping memory; and causing the first execution unit to execute the one of the first executable instructions including the first opcode based on the second mapping. 11. The method of claim 8 , further comprising: generating an opcode for a SESE subgraph; generating candidate output values representing a merged operator; storing the opcode at the programmable opcode mapping memory; and storing the candidate output values at the first execution unit. 12. The method of claim 5 , wherein the second execution unit includes arithmetic circuits to perform arithmetic operations of the other neural network operators; and wherein each of the second executable instructions includes a second opcode that indicates a sequence of arithmetic operations to be performed by the arithmetic circuits. 13. A method comprising: receiving input codes, wherein the input codes represent a computational dataflow graph, wherein the computational dataflow graph includes nodes connected by edges, each node comprising an operator of neural network operators, each edge between two nodes indicating a data dependency between two neural network operators represented by the two nodes; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other op

Assignees

Inventors

Classifications

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • G06F7/548Primary

    Trigonometric functions; Co-ordinate transformations · CPC title

  • Dependency analysis; Data or control flow analysis · CPC title

  • Optimisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11782706B1 cover?
In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise op…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).