Iteration support in a heterogeneous dataflow engine
US-2015007182-A1 · Jan 1, 2015 · US
US10534997B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10534997-B2 |
| Application number | US-201815965742-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 27, 2018 |
| Priority date | Oct 28, 2015 |
| Publication date | Jan 14, 2020 |
| Grant date | Jan 14, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a request from a client to process a computational graph; obtaining data representing the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node; identifying a plurality of available devices for performing the requested operation; partitioning the computational graph into a plurality of subgraphs, each subgraph comprising one or more nodes in the computational graph; and assigning, for each subgraph, the operations represented by the one or more nodes in the subgraph to a respective available device in the plurality of available devices for operation.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving a request from a client to process a computational graph; obtaining data representing the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective inference or training operation for a neural network, and wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node; identifying a plurality of available devices for processing the computational graph; and processing the computational graph using the plurality of available devices, comprising: analyzing the computational graph to identify groups of nodes that operate on shared data flowing on directed edges to the group of nodes, partitioning the computational graph into a plurality of subgraphs, each subgraph comprising one or more nodes in the computational graph, comprising generating, for each identified group of nodes, a respective subgraph including the identified group of nodes, assigning, for each subgraph, the operations represented by the one or more nodes in the subgraph to a respective available device in the plurality of available devices for processing, and, processing the computational graph by causing each device to perform the operations assigned to the device. 2. The method of claim 1 , wherein the request specifies one or more particular outputs from one or more respective nodes, further comprising: receiving, from a device to which the one or more respective nodes are assigned, the one or more particular outputs; and providing the one or more particular outputs to the client. 3. The method of claim 1 , wherein the request comprises labels partitioning the computational graph into a plurality of predetermined subgraphs, and wherein partitioning the computational graph comprises partitioning the computational graph into the plurality of predetermined subgraphs. 4. The method of claim 1 , wherein each device in the plurality of available devices is a hardware resource that performs operations independent of other devices in the plurality of available devices. 5. The method of claim 1 , wherein assigning, for each subgraph, the operations represented by the one or more nodes in the subgraph to a respective device comprises assigning the operations to a device having a computational capability necessary to perform the operations represented by the nodes in the subgraph. 6. The method of claim 1 , further comprising: analyzing the computational graph to identify groups of nodes arranged in a chain structure, and wherein the partitioning comprises generating, for each identified group of nodes arranged in a chain structure, a respective subgraph including the identified group of nodes arranged in a chain structure. 7. The method of claim 1 , further comprising: prior to assigning, for each subgraph, the operations represented by the one or more nodes in the subgraph to a respective available device in the plurality of available devices, determining an initial assignment of operations represented by one or more initial nodes in subgraphs to available devices, wherein the one or more initial nodes are nodes from which data starts flowing in the respective subgraph for the one or more initial nodes; monitoring the devices to determine statistics; adjusting the initial assignment using the statistics; and reassigning the operations of the subgraphs to the devices based on the adjusted initial assignment. 8. The method of claim 7 , further comprising repeating the monitoring, adjusting, and reassigning until a threshold amount of improvement has been achieved. 9. The method of claim 7 , wherein the statistics comprise a respective operation time or a respective idle time for each subgraph. 10. The method of claim 1 , wherein the plurality of available devices comprises: a device having a central processing unit (CPU), and at least one device having a processing unit that is not a CPU. 11. The method of claim 10 , wherein assigning, for each subgraph, the operations represented by the one or more nodes in the subgraph to a respective available device in the plurality of available devices for processing comprises: determining, for each subgraph, that the operations represented by the one or more nodes in the subgraph can only be processed by a particular type of processing unit; and in response, assigning the subgraph to a respective available device in the plurality of available devices having the particular type of processing unit. 12. A system comprising: one or more computers; and computer-readable medium coupled to the one or more computers and having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising: obtaining data representing a computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective inference or training operation for a neural network, and wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node; identifying a plurality of available devices for processing the computational graph; and processing the computational graph using the plurality of available devices, comprising: analyzing the computational graph to identify groups of nodes that operate on shared data flowing on directed edges to the group of nodes, partitioning the computational graph into a plurality of subgraphs, each subgraph comprising one or more nodes in the computational graph, comprising generating, for each identified group of nodes, a respective subgraph including the identified group of nodes, assigning, for each subgraph, the operations represented by the one or more nodes in the subgraph to a respective available device in the plurality of available devices for processing, and, processing the computational graph by causing each device to perform the operations assigned to the device. 13. The system of claim 12 , the operations further comprising: prior to assigning, for each subgraph, the operations represented by the one or more nodes in the subgraph to a respective available device in the plurality of available devices, determining an initial assignment of operations represented by one or more initial nodes in subgraphs to available devices, wherein the one or more initial nodes are nodes from which data starts flowing in the respective subgraph for the one or more initial nodes; monitoring the devices to determine statistics; adjusting the initial assignment using the statistics; and reassigning the operations of the subgraphs to the devices based on the adjusted initial assignment. 14. The system of claim 13 , the operations further comprising repeating the monitoring, adjusting, and reassigning until a threshold amount of improvement has been achieved. 15. The system of claim 13 , wherein the statistics comprise a respective operation time or a respective idle time for each subgraph. 16. The system of claim 12 , wherein the plurality of available devices comprises: a device having a central processing unit (CPU), and at least one device having a processing unit that is not a CPU. 17. The system of claim 16 , wherein assigning, for each subgraph, the operations represented by the one or m
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
Distributed learning, e.g. federated learning · CPC title
Machine learning · CPC title
to service a request · CPC title
Inference or reasoning models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.