Compiling models for dedicated hardware

US11468338B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11468338-B2
Application numberUS-201916262809-A
CountryUS
Kind codeB2
Filing dateJan 30, 2019
Priority dateSep 11, 2018
Publication dateOct 11, 2022
Grant dateOct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The subject technology provides receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations and some of the operations being executable on multiple processors of the target platform. The subject technology further sorts the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors. The subject technology determines, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations. Further, for each layer of the NN model, the subject technology includes an annotation to indicate the processor assigned for each of the operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, at least some of the operations being executable on multiple processors of the target platform, the multiple processors comprising at least a CPU, a GPU, and a neural processor, wherein the CPU, the GPU, and the neural processor each have different computational specifications or capabilities; sorting the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors; determining, based at least in part on a cost of transferring the operations between the multiple processors and a cost of performing the operations at the respective processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations; and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations. 2. The method of claim 1 , wherein determining, based at least in part on the cost of transferring the operations between the multiple processors, the assignment of one of the multiple processors for each of the sorted operations of each of the layers further comprises: generating a graph with operations sorted by an order of execution based on the sorted operations from the multiple layers; determining a path through nodes of the graph with an overall smallest cost to execute the operations from the multiple layers of the NN; and determining the assignment of one of the multiple processors for each of the sorted operations of each of the layers based at least in part on the determined path through the nodes of the graph. 3. The method of claim 2 , wherein each node in the graph represents a cost of an operation, from a particular layer, performed on a respective processor from the multiple processors of the target platform on which the operation is executable, and each edge in the graph represents a cost of transferring the operation from a first processor at a first layer to a second processor at a second layer of the NN. 4. The method of claim 1 , wherein the cost of transferring the operations comprises an amount of latency for transferring the operations between the multiple processors. 5. The method of claim 2 , wherein determining the path through nodes of the graph comprises determining a shortest path based on the overall smallest cost for traversing through each node of the graph, the shortest path corresponding to performing each operation in the multiple layers of the NN model. 6. The method of claim 1 , wherein the neural processor is specifically configured to perform operations related to neural network models. 7. The method of claim 6 , wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations. 8. The method of claim 1 , wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations. 9. A system comprising; a processor; a memory device containing instructions, which when executed by the processor cause the processor to: receive a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, at least some of the operations being executable on multiple processors of the target platform, the multiple processors comprising at least a CPU, a GPU, and a neural processor, wherein the CPU, the GPU, and the neural processor each have different computational specifications or capabilities; sort the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors; determine, based at least in part on a cost of transferring the operations between the multiple processors and a cost of performing the operations at the respective processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations; and for each layer of the NN model, include an annotation to indicate the processor assigned for each of the operations. 10. The system of claim 9 , wherein to determine, based at least in part on the cost of transferring the operations between the multiple processors, the assignment of one of the multiple processors for each of the sorted operations of each of the layers further causes the processor to: generate a graph with operations sorted by an order of execution based on the sorted operations from the multiple layers; determine a path through nodes of the graph with an overall smallest cost to execute the operations from the multiple layers of the NN; and determine the assignment of one of the multiple processors for each of the sorted operations of each of the layers based at least in part on the determined path through the nodes of the graph. 11. The system of claim 10 , wherein each node in the graph represents a cost of an operation, from a particular layer, performed on a respective processor from the multiple processors of the target platform on which the operation is executable, and each edge in the graph represents a cost of transferring the operation from a first processor at a first layer to a second processor at a second layer of the NN. 12. The system of claim 10 , wherein to determine the path through nodes of the graph with the overall smallest cost to execute the operations further causes the processor to: determine an amount of latency for transferring the operation performed on the respective processor to another processor. 13. The system of claim 10 , wherein to determine the path through nodes of the graph comprises determining a shortest path based on the overall smallest cost for traversing through each node of the graph, the shortest path corresponding to performing each operation in the multiple layers of the NN model. 14. The system of claim 9 , wherein the neural processor is configured to perform operations related to neural network models. 15. The system of claim 14 , wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations. 16. The system of claim 9 , wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations. 17. A non-transitory computer-readable medium comprising instructions, which when executed by a computing device, cause the computing device to perform operations comprising: receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, at least some of the operations being executable on multiple processors of the target platform, the multiple processors comprising at least a CPU, a GPU, and a neural processor, wherein the CPU, the GPU, and the neural processor each have different computational specifications or capabilities; sorting the operations from the multiple layers in a particular order based at least in part on gro

Assignees

Inventors

Classifications

  • Feedforward networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06N3/10Primary

    Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468338B2 cover?
The subject technology provides receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations and some of the operations being executable on multiple processors of the target platform. The subject technology further sorts the operations from the multiple layers in a particular order based at least in part on grouping th…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).