Stream-based accelerator processing of computational graphs

US2017124451A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017124451-A1
Application numberUS-201615336673-A
CountryUS
Kind codeA1
Filing dateOct 27, 2016
Priority dateOct 28, 2015
Publication dateMay 4, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving, by a computational graph system, a request to process a computational graph; obtaining data representing a subgraph of the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node, the subgraph assigned to a first device by a placer in the computational graph system; determining that the first device comprises a hardware accelerator having a plurality of streams; in response to determining, generating instructions that when executed by the first device cause the first device to: assign the operation represented by each node in the subgraph to a respective stream; and perform the operations represented by the nodes in the subgraph in accordance with the assignment.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving, by a computational graph system, a request to process a computational graph; obtaining data representing a subgraph of the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node, the subgraph assigned to a first device by a placer in the computational graph system; determining that the first device comprises a hardware accelerator having a plurality of streams; in response to determining that the first device comprises a hardware accelerator having a plurality of streams, generating instructions that when executed by the first device cause the first device to: assign the operation represented by each node in the subgraph to a respective stream in the plurality of streams of the hardware accelerator; and perform the operations represented by the nodes in the subgraph in accordance with the assignment; and providing the instructions and the data to the first device. 2 . The method of claim 1 , wherein the request specifies identifying one or more particular outputs from one or more respective nodes in the subgraph, further comprising: receiving, from the first device, the one or more particular outputs; and providing the one or more particular outputs to the client. 3 . The method of claim 1 , wherein the instructions further cause the first device to store the one or more particular outputs in memory of the first device. 4 . The method of claim 1 , wherein the operations for the subgraph comprise partial inference or training computations for a neural network. 5 . The method of claim 1 , further comprising: analyzing the subgraph to identify a group of nodes in the subgraph in a chain structure; wherein the instructions cause the first device to assign the group of nodes to one stream. 6 . The method of claim 1 , wherein the assigning comprises: analyzing the subgraph to identify a first node in the subgraph having a plurality of directed edges as outputs; wherein the instructions cause the first device to assign, for each of the directed edges, a node to which the directed edge points to a disjoint stream of the hardware accelerator. 7 . The method of claim 1 , wherein the instructions cause the first device to determine, for each node, a respective amount of memory resources in the hardware accelerator consumed by the operation represented by the node based on the directed edges to the node, wherein the assigning is based at least on the respective amount of memory resources. 8 . The method of claim 1 , wherein the instructions cause the first device to determine a particular operation represented by a node has finished at a particular stream; in response to determining the particular operation has finished: determine a first amount of memory consumed by the particular operation that will be freed; determine, for each of a group of unassigned nodes, a respective estimated amount of memory consumed by an operation that is represented by the unassigned node; determine, from the group of unassigned nodes, a first unassigned node that represents an operation, which executes on a stream of the hardware accelerator, with the estimated amount of memory that maximizes usage of the first amount of memory; and assign an operation represented by the first unassigned node to the particular stream. 9 . The method of claim 1 , wherein the instructions cause the first device to determine a particular operation represented by a node has finished at a particular stream: in response to determining the particular operation has finished: determine at least one subsequent operation that uses the output of the particular operation as input; and reuse memory allocated for the output of the particular operation after the at least one subsequent operation has executed. 10 . The method of claim 9 , wherein determining at least one subsequent operation that uses the output of the particular operation as input includes: determining that at least two subsequent operations, a first operation in a first stream and a second operation in a second stream, use the output of the particular operation as input; placing a first marker in a first stream that indicates when the first operation has used the particular operation as input; placing a second marker in a second stream that indicates when the second operation has used the particular operation as input; determining that both operations have used the particular operation upon indication from the first and second markers. 11 . A system comprising: one or more computers; and computer-readable medium coupled to the one or more computers and having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to, for each of the neural network layers, perform operations comprising: receiving, by a computational graph system, a request to process a computational graph; obtaining data representing a subgraph of the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node, the subgraph assigned to a first device by a placer in the computational graph system; determining that the first device comprises a hardware accelerator having a plurality of streams; in response to determining that the first device comprises a hardware accelerator having a plurality of streams, generating instructions that when executed by the first device cause the first device to: assign the operation represented by each node in the subgraph to a respective stream in the plurality of streams of the hardware accelerator; and perform the operations represented by the nodes in the subgraph in accordance with the assignment; and providing the instructions and the data to the first device. 12 . The system of claim 9 , wherein the request specifies identifying one or more particular outputs from one or more respective nodes in the subgraph, further comprising: receiving, from the first device, the one or more particular outputs; and providing the one or more particular outputs to the client. 13 . The system of claim 9 , further comprising: analyzing the subgraph to identify a group of nodes in the subgraph in a chain structure; wherein the instructions cause the first device to assign the group of nodes to one stream. 14 . The system of claim 9 , wherein the assigning comprises: analyzing the subgraph to identify a first node in the subgraph has a plurality of directed edges as outputs; wherein the instructions cause the first device to assign, for each of the directed edges, a node to which the directed edge points to a unique stream of the hardware accelerator. 15 . The system of claim 9 , wherein the instructions cause the first device to determine, for each node, a respective amount of memory resources in the hardware accelerator consumed by the operation represented by the node based on the directed edges to the node, wherein the assigning is based at least on the respective amount of memory resources. 16 . The system of

Assignees

Inventors

Classifications

  • Backpropagation, e.g. using gradient descent · CPC title

  • G06F9/5066Primary

    Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title

  • using electronic means · CPC title

  • G06F9/5038Primary

    considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017124451A1 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving, by a computational graph system, a request to process a computational graph; obtaining data representing a subgraph of the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each …
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/5066. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 04 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).