Quantization for dnn accelerators
US-2019340499-A1 · Nov 7, 2019 · US
US11468338B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11468338-B2 |
| Application number | US-201916262809-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 30, 2019 |
| Priority date | Sep 11, 2018 |
| Publication date | Oct 11, 2022 |
| Grant date | Oct 11, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The subject technology provides receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations and some of the operations being executable on multiple processors of the target platform. The subject technology further sorts the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors. The subject technology determines, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations. Further, for each layer of the NN model, the subject technology includes an annotation to indicate the processor assigned for each of the operations.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, at least some of the operations being executable on multiple processors of the target platform, the multiple processors comprising at least a CPU, a GPU, and a neural processor, wherein the CPU, the GPU, and the neural processor each have different computational specifications or capabilities; sorting the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors; determining, based at least in part on a cost of transferring the operations between the multiple processors and a cost of performing the operations at the respective processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations; and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations. 2. The method of claim 1 , wherein determining, based at least in part on the cost of transferring the operations between the multiple processors, the assignment of one of the multiple processors for each of the sorted operations of each of the layers further comprises: generating a graph with operations sorted by an order of execution based on the sorted operations from the multiple layers; determining a path through nodes of the graph with an overall smallest cost to execute the operations from the multiple layers of the NN; and determining the assignment of one of the multiple processors for each of the sorted operations of each of the layers based at least in part on the determined path through the nodes of the graph. 3. The method of claim 2 , wherein each node in the graph represents a cost of an operation, from a particular layer, performed on a respective processor from the multiple processors of the target platform on which the operation is executable, and each edge in the graph represents a cost of transferring the operation from a first processor at a first layer to a second processor at a second layer of the NN. 4. The method of claim 1 , wherein the cost of transferring the operations comprises an amount of latency for transferring the operations between the multiple processors. 5. The method of claim 2 , wherein determining the path through nodes of the graph comprises determining a shortest path based on the overall smallest cost for traversing through each node of the graph, the shortest path corresponding to performing each operation in the multiple layers of the NN model. 6. The method of claim 1 , wherein the neural processor is specifically configured to perform operations related to neural network models. 7. The method of claim 6 , wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations. 8. The method of claim 1 , wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations. 9. A system comprising; a processor; a memory device containing instructions, which when executed by the processor cause the processor to: receive a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, at least some of the operations being executable on multiple processors of the target platform, the multiple processors comprising at least a CPU, a GPU, and a neural processor, wherein the CPU, the GPU, and the neural processor each have different computational specifications or capabilities; sort the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors; determine, based at least in part on a cost of transferring the operations between the multiple processors and a cost of performing the operations at the respective processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations; and for each layer of the NN model, include an annotation to indicate the processor assigned for each of the operations. 10. The system of claim 9 , wherein to determine, based at least in part on the cost of transferring the operations between the multiple processors, the assignment of one of the multiple processors for each of the sorted operations of each of the layers further causes the processor to: generate a graph with operations sorted by an order of execution based on the sorted operations from the multiple layers; determine a path through nodes of the graph with an overall smallest cost to execute the operations from the multiple layers of the NN; and determine the assignment of one of the multiple processors for each of the sorted operations of each of the layers based at least in part on the determined path through the nodes of the graph. 11. The system of claim 10 , wherein each node in the graph represents a cost of an operation, from a particular layer, performed on a respective processor from the multiple processors of the target platform on which the operation is executable, and each edge in the graph represents a cost of transferring the operation from a first processor at a first layer to a second processor at a second layer of the NN. 12. The system of claim 10 , wherein to determine the path through nodes of the graph with the overall smallest cost to execute the operations further causes the processor to: determine an amount of latency for transferring the operation performed on the respective processor to another processor. 13. The system of claim 10 , wherein to determine the path through nodes of the graph comprises determining a shortest path based on the overall smallest cost for traversing through each node of the graph, the shortest path corresponding to performing each operation in the multiple layers of the NN model. 14. The system of claim 9 , wherein the neural processor is configured to perform operations related to neural network models. 15. The system of claim 14 , wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations. 16. The system of claim 9 , wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations. 17. A non-transitory computer-readable medium comprising instructions, which when executed by a computing device, cause the computing device to perform operations comprising: receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, at least some of the operations being executable on multiple processors of the target platform, the multiple processors comprising at least a CPU, a GPU, and a neural processor, wherein the CPU, the GPU, and the neural processor each have different computational specifications or capabilities; sorting the operations from the multiple layers in a particular order based at least in part on gro
Feedforward networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title
using electronic means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.