Asynchronous neural network training

US11288575B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11288575-B2
Application numberUS-201715599058-A
CountryUS
Kind codeB2
Filing dateMay 18, 2017
Priority dateMay 18, 2017
Publication dateMar 29, 2022
Grant dateMar 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural network training apparatus is described which has a network of worker nodes each having a memory storing a subgraph of a neural network to be trained. The apparatus has a control node connected to the network of worker nodes. The control node is configured to send training data instances into the network to trigger parallelized message passing operations which implement a training algorithm which trains the neural network. At least some of the message passing operations asynchronously update parameters of individual subgraphs of the neural network at the individual worker nodes.

First claim

Opening claim text (preview).

The invention claimed is: 1. A neural network training apparatus comprising: a neural network of individual worker nodes each having a memory storing a subgraph of a neural network to be trained; and a control node connected to the network of individual worker nodes, wherein the control node invokes a different path through the network of individual worker nodes based on graphical structures of training data instances resulting in a dynamic pipeline; wherein the control node is configured to send the training data instances into the network to trigger parallelized message passing operations which implement a training algorithm which trains the neural network, wherein the individual worker nodes in the dynamic pipeline are can be used multiple times based on a graphical structure of an input data instance, including the graphical structures of the training data instances, sent into the network of individual worker nodes by the control node, wherein a path through the dynamic pipeline is different for at least two different input data instances with different graphical structures processed by the neural network, and wherein at least some of the message passing operations asynchronously update parameters of individual subgraphs of the neural network at the individual worker nodes based on calculating accumulated gradients obtained from the individual worker nodes in the dynamic pipeline that reduces a loss function corresponding to the individual worker nodes. 2. The apparatus of claim 1 wherein: the input data instance is a graphical structure of an organic molecule with rings of bonded atoms. 3. The apparatus of claim 1 wherein the control node is configured to keep a record of a number of training data instances which are in flight in the network of individual worker nodes. 4. The apparatus of claim 1 wherein the control node is configured to control a rate at which it sends training data instances into the network of individual worker nodes. 5. The apparatus of claim 4 wherein the control node is configured to control the rate on a basis of one or more of: a number of in-flight training data instances in the network of individual worker nodes, neural network architecture type, data instance features, observed worker node performance factors, observed communications network performance, pipeline features. 6. The apparatus of claim 1 wherein: the control node is configured to send test data instances into the network of individual worker nodes for processing by the neural network, and the test data instances and the training data instances are concurrently processed by the neural network. 7. The apparatus of claim 1 wherein the network of individual worker nodes is a pipeline. 8. The apparatus of claim 7 wherein the message passing operations triggered by the control node comprise a forward process and a backward process, wherein the forward process comprises forward messages sent from the control node along the network of individual worker nodes to a terminal node of the network of individual worker nodes and backward messages sent from the terminal node along the network of individual worker nodes to the control node. 9. The apparatus of claim 1 wherein the worker nodes comprise on-chip memory, and wherein the parameters of the individual subgraphs of the neural network at the individual worker nodes are stored in the on-chip memory. 10. A worker node of a neural network training apparatus comprising: a memory storing a subgraph of a neural network; and a processor configured to asynchronously update parameters of the subgraph of the neural network stored in the memory according to at least one message received at the worker node from another worker node of a plurality of worker nodes of the apparatus based on calculating accumulated gradients obtained from the worker nodes that reduces a loss function corresponding to the worker nodes over which a graph of training data instances representing the neural network is partitioned, wherein: the plurality of worker nodes comprises a dynamic pipeline, the worker nodes in the dynamic pipeline are used multiple times based on a graph of an input data instance, including the graph of the training data instances, sent into the neural network, and a path through the dynamic pipeline is different for at least two different data instances with different graphs processed by the neural network. 11. The worker node of claim 10 wherein the memory is on-chip memory. 12. The worker node of claim 10 wherein the worker node comprises an accumulator which accumulates the gradients computed by the worker node using data received in messages from the other worker node, and wherein the processor is configured to asynchronously update the parameters using the accumulated gradients when criteria are met. 13. The worker node of claim 12 wherein the criteria comprise one or more of: number of accumulated gradients, neural network architecture type, data instance features, observed worker node performance factors, observed communications network performance, subgraph factors. 14. The worker node of claim 12 wherein the criteria are bespoke to the worker node. 15. The worker node of claim 12 wherein the processor is configured to dynamically adjust the criteria. 16. The worker node of claim 12 wherein the worker node computes the accumulated gradients by computing gradients of the loss function comparing a neural network prediction and a label received from a control node. 17. A pipeline comprising a plurality of worker nodes as defined in claim 16 . 18. A computer implemented method at a worker node of a neural network training apparatus comprising: storing, at a memory, a subgraph of a neural network; receiving a message from another worker node of a plurality of worker nodes of the apparatus over which a graph of training data instances representing the neural network is partitioned, wherein the plurality of worker nodes comprises a dynamic pipeline; and asynchronously updating parameters of the subgraph of the neural network stored in the memory according to the received message based on calculating accumulated gradients obtained from the worker nodes that reduces a loss function corresponding to the worker nodes, wherein the worker nodes in the dynamic pipeline are used multiple times based on a graph of an input data instance, including the graph of the training data instances, sent into the neural network, and wherein a path through the dynamic pipeline is different for at least two different data instances with different graphs processed by the neural network. 19. The method of claim 18 wherein updating the parameters according to the received message comprises computing at least one gradient of the loss function of the neural network using data in the received message. 20. The method of claim 18 further comprising accumulating the gradients computed by the worker node using data received in messages from the other worker node, and asynchronously updating the parameters, based on the path through the dynamic pipeline, using the accumulated gradients when criteria are met.

Assignees

Inventors

Classifications

  • Architecture, e.g. interconnection topology · CPC title

  • G06N3/098Primary

    Distributed learning, e.g. federated learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11288575B2 cover?
A neural network training apparatus is described which has a network of worker nodes each having a memory storing a subgraph of a neural network to be trained. The apparatus has a control node connected to the network of worker nodes. The control node is configured to send training data instances into the network to trigger parallelized message passing operations which implement a training algo…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/098. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).