Iteration support in a heterogeneous dataflow engine
US-2015007182-A1 · Jan 1, 2015 · US
US2017124451A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017124451-A1 |
| Application number | US-201615336673-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 27, 2016 |
| Priority date | Oct 28, 2015 |
| Publication date | May 4, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving, by a computational graph system, a request to process a computational graph; obtaining data representing a subgraph of the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node, the subgraph assigned to a first device by a placer in the computational graph system; determining that the first device comprises a hardware accelerator having a plurality of streams; in response to determining, generating instructions that when executed by the first device cause the first device to: assign the operation represented by each node in the subgraph to a respective stream; and perform the operations represented by the nodes in the subgraph in accordance with the assignment.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving, by a computational graph system, a request to process a computational graph; obtaining data representing a subgraph of the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node, the subgraph assigned to a first device by a placer in the computational graph system; determining that the first device comprises a hardware accelerator having a plurality of streams; in response to determining that the first device comprises a hardware accelerator having a plurality of streams, generating instructions that when executed by the first device cause the first device to: assign the operation represented by each node in the subgraph to a respective stream in the plurality of streams of the hardware accelerator; and perform the operations represented by the nodes in the subgraph in accordance with the assignment; and providing the instructions and the data to the first device. 2 . The method of claim 1 , wherein the request specifies identifying one or more particular outputs from one or more respective nodes in the subgraph, further comprising: receiving, from the first device, the one or more particular outputs; and providing the one or more particular outputs to the client. 3 . The method of claim 1 , wherein the instructions further cause the first device to store the one or more particular outputs in memory of the first device. 4 . The method of claim 1 , wherein the operations for the subgraph comprise partial inference or training computations for a neural network. 5 . The method of claim 1 , further comprising: analyzing the subgraph to identify a group of nodes in the subgraph in a chain structure; wherein the instructions cause the first device to assign the group of nodes to one stream. 6 . The method of claim 1 , wherein the assigning comprises: analyzing the subgraph to identify a first node in the subgraph having a plurality of directed edges as outputs; wherein the instructions cause the first device to assign, for each of the directed edges, a node to which the directed edge points to a disjoint stream of the hardware accelerator. 7 . The method of claim 1 , wherein the instructions cause the first device to determine, for each node, a respective amount of memory resources in the hardware accelerator consumed by the operation represented by the node based on the directed edges to the node, wherein the assigning is based at least on the respective amount of memory resources. 8 . The method of claim 1 , wherein the instructions cause the first device to determine a particular operation represented by a node has finished at a particular stream; in response to determining the particular operation has finished: determine a first amount of memory consumed by the particular operation that will be freed; determine, for each of a group of unassigned nodes, a respective estimated amount of memory consumed by an operation that is represented by the unassigned node; determine, from the group of unassigned nodes, a first unassigned node that represents an operation, which executes on a stream of the hardware accelerator, with the estimated amount of memory that maximizes usage of the first amount of memory; and assign an operation represented by the first unassigned node to the particular stream. 9 . The method of claim 1 , wherein the instructions cause the first device to determine a particular operation represented by a node has finished at a particular stream: in response to determining the particular operation has finished: determine at least one subsequent operation that uses the output of the particular operation as input; and reuse memory allocated for the output of the particular operation after the at least one subsequent operation has executed. 10 . The method of claim 9 , wherein determining at least one subsequent operation that uses the output of the particular operation as input includes: determining that at least two subsequent operations, a first operation in a first stream and a second operation in a second stream, use the output of the particular operation as input; placing a first marker in a first stream that indicates when the first operation has used the particular operation as input; placing a second marker in a second stream that indicates when the second operation has used the particular operation as input; determining that both operations have used the particular operation upon indication from the first and second markers. 11 . A system comprising: one or more computers; and computer-readable medium coupled to the one or more computers and having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to, for each of the neural network layers, perform operations comprising: receiving, by a computational graph system, a request to process a computational graph; obtaining data representing a subgraph of the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node, the subgraph assigned to a first device by a placer in the computational graph system; determining that the first device comprises a hardware accelerator having a plurality of streams; in response to determining that the first device comprises a hardware accelerator having a plurality of streams, generating instructions that when executed by the first device cause the first device to: assign the operation represented by each node in the subgraph to a respective stream in the plurality of streams of the hardware accelerator; and perform the operations represented by the nodes in the subgraph in accordance with the assignment; and providing the instructions and the data to the first device. 12 . The system of claim 9 , wherein the request specifies identifying one or more particular outputs from one or more respective nodes in the subgraph, further comprising: receiving, from the first device, the one or more particular outputs; and providing the one or more particular outputs to the client. 13 . The system of claim 9 , further comprising: analyzing the subgraph to identify a group of nodes in the subgraph in a chain structure; wherein the instructions cause the first device to assign the group of nodes to one stream. 14 . The system of claim 9 , wherein the assigning comprises: analyzing the subgraph to identify a first node in the subgraph has a plurality of directed edges as outputs; wherein the instructions cause the first device to assign, for each of the directed edges, a node to which the directed edge points to a unique stream of the hardware accelerator. 15 . The system of claim 9 , wherein the instructions cause the first device to determine, for each node, a respective amount of memory resources in the hardware accelerator consumed by the operation represented by the node based on the directed edges to the node, wherein the assigning is based at least on the respective amount of memory resources. 16 . The system of
Backpropagation, e.g. using gradient descent · CPC title
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
using electronic means · CPC title
considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.