Computational graph optimization

US11657289B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11657289-B2
Application numberUS-202016840191-A
CountryUS
Kind codeB2
Filing dateApr 3, 2020
Priority dateFeb 7, 2020
Publication dateMay 23, 2023
Grant dateMay 23, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the execution of the operations of a neural network. One of the methods includes obtaining data representing a graph characterizing a plurality of operations of a neural network, wherein each node of the graph characterizes an operation of the neural network and each edge of the graph characterizes data dependency between the operations; processing the data representing the graph using a graph embedding neural network to generate an embedding of the graph; and processing the embedding of the graph using a policy neural network to generate a task output, wherein the task output comprises, for each of the plurality of operations of the neural network, a respective decision for a particular optimization task.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating a task output for executing a plurality of operations of a neural network on one or more processing devices, wherein the task output comprises, for each of the plurality of operations of the neural network, a respective decision for a particular optimization task, the method comprising: obtaining data representing a graph characterizing the plurality of operations of the neural network, wherein each node of the graph characterizes an operation of the neural network and each edge of the graph characterizes data dependency between the operations; processing the data representing the graph using a graph embedding neural network to generate an embedding of the graph; and processing the embedding of the graph using a policy neural network to generate the task output, wherein the policy neural network is conditioned on features of the graph. 2. The method of claim 1 , wherein the embedding of the graph comprises a respective node embedding of each node of the graph. 3. The method of claim 2 , wherein processing the data representing the graph using the graph embedding neural network comprises, at each of a plurality of embedding time steps: receiving a current embedding of each node of the graph generated during a previous embedding time step; combining, for each particular node of the graph, the respective current embedding of each neighboring node of the particular node to generate a neighborhood embedding of the particular node; and combining, for each node of the graph, the current embedding of the node and the neighborhood embedding of the node to generate a new embedding of the node. 4. The method of claim 3 , wherein processing the data representing the graph using the graph embedding neural network further comprises, at a first embedding time step: generating an initial embedding for each node of the graph using features of the node, wherein the features comprise one or more of: an operation type of the operation characterized by the node, an output shape of an output of the operation characterized by the node, or a respective identification for each neighboring node of the node. 5. The method of claim 3 , wherein combining the current embedding of a particular node and the neighborhood embedding of the particular node comprises: concatenating the current embedding of the particular node and the neighborhood embedding of the particular node to generate a combined embedding of the particular node; and processing the combined embedding of the particular node using one or more fully-connected neural network layers to generate the new embedding of the particular node. 6. The method of claim 3 , wherein combining the respective embedding of each neighboring node of a particular node comprises: processing, for each neighboring node of the particular node, the current embedding of the neighboring node using an affine transformation to generate a processed embedding for the neighboring node; processing, for each neighboring node of the particular node, the processed embedding of the neighboring node using a sigmoid activation function to generate an activation embedding for the neighboring node; and combining the respective activation embedding of each neighboring node of the particular node by processing the activation embeddings using a max pooling layer. 7. The method of claim 1 , wherein the particular optimization task is one of: a device placement task, wherein each of the plurality of operations of the neural network is assigned a particular processing device of the one or more processing devices; an operation scheduling task, wherein each of the plurality of operations of the neural network is assigned a priority, and wherein each processing device comprises a priority-based scheduler that maintains a priority queue of the operations assigned to the processing device; or an operation fusion task, wherein a plurality of selected operations of the neural network are determined to be executed as if the selected operations were a single operation. 8. The method of claim 1 , wherein the graph embedding neural network and the policy neural network have been trained end-to-end by updating parameters of the neural network using an objective function that characterizes an expected runtime of respective operations characterized by each of a plurality of candidate graphs. 9. The method of claim 8 , wherein the objective function is: J ⁡ ( θ ) = E G ~ 𝒢 , T ~ π θ ( G ) [ r G , T ] ≈ 1 N ⁢ ∑ G E T ~ π θ ( G ) [ r G , T ] , where G is a candidate graph, G is a space of candidate graphs, T is a task output for the particular optimization task, π θ (G) is a policy for the candidate graph G under the parameters θ, and r G,T is a reward that is a function of the runtime of the operations characterized by the candidate graph G using the task output T. 10. The method of claim 8 , wherein the graph embedding neural network and the policy neural network have been trained by optimizing the objective function using Proximal Policy Optimization. 11. The method of claim 1 , wherein conditioning the policy neural network on features of the graph comprises computing an output x (l+1) of a neural network layer l of the policy neural network: x (l+1) =g (l) ( c ( x (0) )⊙ x (l) ), where x (l) is a

Assignees

Inventors

Classifications

  • Reinforcement learning · CPC title

  • by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination · CPC title

  • Optimisation · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Graphical models, e.g. Bayesian networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11657289B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the execution of the operations of a neural network. One of the methods includes obtaining data representing a graph characterizing a plurality of operations of a neural network, wherein each node of the graph characterizes an operation of the neural network and each edge of the graph …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 23 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).