Loop and library fusion
US-9798527-B1 · Oct 24, 2017 · US
US2021182036A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021182036-A1 |
| Application number | US-201916712449-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 12, 2019 |
| Priority date | Dec 12, 2019 |
| Publication date | Jun 17, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and associated apparatus for generating a neural network computation graph. The method includes receiving, by a compiler, a computation graph representing a neural network. The computation graph includes a plurality of nodes, each node associated with an operator of the neural network. The compiler receives a list of fusion patterns associated with a target hardware execution device, and analyzes the computation graph using the list of fusion patterns. The compiler generates one or more fused operators based on the analysis, each fused operator including at least two operators of the plurality of operators which can be fused. The compiler generates a new computation graph representing the neural network that includes at least a first fused operator of the generated one or more fused operators.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving, by a compiler, a computation graph representing a neural network, the computation graph comprising a plurality of nodes, each node associated with an operator of the neural network; receiving, by the compiler, a list of fusion patterns associated with a target hardware execution device; analyzing, by the compiler, the computation graph using the list of fusion patterns; generating one or more fused operators based on the analysis, each fused operator comprising at least two operators of the plurality of operators which can be fused; and generating, by the compiler, a new computation graph representing the neural network that includes at least a first fused operator of the generated one or more fused operators. 2 . The method of claim 1 , further comprising determining, based on a cost model associated with the target hardware execution device, a computation cost associated with the generating of each of the one or more fused operators, and wherein the analyzing is based on the computation cost associated with the generating of each of the one or more fused operators. 3 . The method of claim 1 wherein each fusion pattern in the list of fusion patterns is associated with a condition for generating a fused operator. 4 . The method of claim 3 , wherein the condition relates to at least one of a memory allocation requirement associated with the fused operator, a size of a feature map input to a layer of the neural network, and a size of a filter of a layer of the neural network. 5 . The method of claim 4 , wherein the neural network includes a convolution layer and the condition specifies a constraint on at least one of a shape of a kernel of the convolution layer, a size of the kernel of convolution layer, and a data type of an execution kernel associated with the fused operator. 6 . The method of claim 1 , wherein each of the generated one or more fused operators specify a dataflow of computations which are equivalent to the dataflow of computations of the plurality of nodes of the computation graph representing the neural network. 7 . The method of claim 6 , further comprising outputting the generated one or more fused operators to the target hardware execution device for execution. 8 . The method of claim 7 , further comprising assigning priorities to each fusion pattern in the list of fusion patterns based on a cost model. 9 . The method of claim 8 , wherein the generated one or more fused operators are output to the target hardware execution device for execution in accordance with the priorities assigned to each fusion pattern in the list of fusion patterns. 10 . A non-transitory computer readable medium storing instructions executable in one or more processors, the instructions when executed in the one or more processors causing operations comprising: receiving, by a compiler, a computation graph representing a neural network, the computation graph comprising a plurality of operators of the neural network; receiving, by the compiler, a list of fusion patterns associated with a target hardware execution device; analyzing, by the compiler, the computation graph using the list of fusion patterns and generating one or more fused operators based on the analysis, each fused operator comprising at least two operators of the plurality of operators which can be fused; and generating, by the compiler, a new computation graph representing the neural network that includes at least a first fused operator of the generated one or more fused operators. 11 . The non-transitory computer readable medium of claim 10 , wherein the instructions are executable to cause operations comprising assigning priorities to each fusion pattern in the list of fusion patterns based on a cost model. 12 . The non-transitory computer readable medium of claim 10 , further comprising determining, based at least in accordance with the cost model, a computation cost associated with the generating of the one or more fused operators. 13 . The non-transitory computer readable medium of claim 12 , wherein, in accordance with the cost model, the computation cost is determined based on generating the one or more fused operators at the target hardware execution device. 14 . The non-transitory computer readable medium of claim 10 , wherein the list of fusion patterns specifies a condition for generating a fused operator based on the plurality of operators. 15 . The non-transitory computer readable medium of claim 14 , wherein the condition relates to at least one of a memory allocation requirement associated with the fused operator, an input feature relating to supported operator fusion patterns for the target hardware execution device, and a filter size in accordance with a neural network layer of the neural network. 16 . The non-transitory computer readable medium of claim 14 , wherein the condition specifies a constraint on at least one of a kernel shape, a kernel size and a data type of an underlying execution kernel associated with the fused operator. 17 . The non-transitory computer readable medium of claim 10 , wherein the generated one or more fused operators specify a flow of computations in accordance with a plurality of nodes of the neural network. 18 . The non-transitory computer readable medium of claim 17 , the instructions being executable to cause operations comprising providing the generated one or more fused operators to the target hardware execution device associated with a set of hardware platform specific patterns. 19 . The non-transitory computer readable medium of claim 17 , wherein the generated one or more fused operators are in accordance with a set of priorities assigned to each the set of hardware platform specific patterns as provided to the compiler. 20 . An apparatus comprising: a processor; and a memory storing instructions that when executed by the processor cause the apparatus to: receive a computation graph representing a neural network, the computation graph comprising a plurality of nodes, each node associated with an operator of the neural network; receive a list of fusion patterns associated with a target hardware execution device; analyze the computation graph using the list of fusion patterns; generate one or more fused operators based on the analysis, each fused operator comprising at least two operators of the plurality of operators which can be fused; and generate a new computation graph representing the neural network that includes at least a first fused operator of the generated one or more fused operators.
Combinations of networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
using electronic means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.