Method and System for Training a Neural Network

US2016267380A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016267380-A1
Application numberUS-201514657414-A
CountryUS
Kind codeA1
Filing dateMar 13, 2015
Priority dateMar 13, 2015
Publication dateSep 15, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Training a neural network is a time consuming and computationally expensive task. Embodiments provide efficient methods and systems for neural network training One example embodiment is implemented by a plurality of agents, where each agent performs a pipelined gradient analysis to update respective local models of the neural network using respective subsets of data from a common pool of training data. In turn, a common global model of the neural network is updated based upon the local models.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of training a neural network, the method comprising: by each agent of a plurality of agents, performing a pipelined gradient analysis to update respective local models of a neural network using respective subsets of data from a common pool of training data; and updating a common global model of the neural network based upon the local models. 2 . The method of claim 1 wherein performing the pipelined gradient analysis comprises: splitting the respective local models of the neural network into consecutive chunks; and assigning each chunk to a stage of a pipeline. 3 . The method of claim 1 wherein each stage of the pipeline is associated with a graphics processing unit (GPU). 4 . The method of claim 1 wherein performing the pipelined gradient analysis further comprises: selecting the subsets of data from the common pool of training data according to a focused-attention back-propagation (FABP) strategy. 5 . The method of claim 1 further including an initialization procedure comprising: by a single agent of the plurality of agents: performing the pipelined gradient analysis to update its respective local model of the neural network using a respective subset of data from the common pool of training data; and updating the common global model of the neural network based upon its local model. 6 . The method of claim 1 wherein the common global model is owned by a single agent of the plurality of agents at any one time according to a locking mechanism. 7 . The method of claim 6 wherein the common global model is updated by the single agent during a period in which the single agent owns the common global model. 8 . The method of claim 1 wherein a critical section is reached when an agent of the plurality is ready to update the global model and the agent of the plurality that is ready to update the global model does not own the global model. 9 . The method of claim 8 wherein the agent that is ready to update the global model requests the global model. 10 . A computer system for training a neural network, the computer system comprising: a processor; and a memory with computer code instructions stored thereon, the processor and the memory, with the computer code instructions being configured to cause the system to: by each agent of a plurality of agents, perform a pipelined gradient analysis to update respective local models of a neural network using respective subsets of data from a common pool of training data; and update a common global model of the neural network based upon the local models. 11 . The computer system of claim 10 , wherein, in performing the pipelined gradient analysis, the processor and the memory, with the computer code instructions, are further configured to cause the system to: split the respective local models of the neural network into consecutive chunks; and assign each chunk to a stage of a pipeline. 12 . The computer system of claim 10 wherein each stage of the pipeline is associated with a graphics processing unit (GPU). 13 . The computer system of claim 10 , wherein, in performing the pipelined gradient analysis, the processor and the memory, with the computer code instructions, are further configured to cause the system to: select the subsets of data from the common pool of training data according to a focused-attention back-propagation (FABP) strategy. 14 . The computer system of claim 10 , wherein the processor and the memory, with the computer code instructions, are further configured to implement an initialization procedure that causes the system to: by a single agent of the plurality of agents: perform the pipelined gradient analysis to update its respective local model of the neural network using a respective subset of data from the common pool of training data; and update the common global model of the neural network based upon its local model. 15 . The computer system of claim 10 wherein the common global model is owned by a single agent of the plurality of agents at any one time according to a locking mechanism. 16 . The computer system of claim 15 wherein the common global model is updated by the single agent during a period in which the single agent owns the common global model. 17 . The computer system of claim 10 wherein a critical section is reached when an agent of the plurality is ready to update the global model and the agent of the plurality that is ready to update the global model does not own the global model. 18 . The computer system of claim 17 wherein the agent that is ready to update the global model requests the global model. 19 . A computer program product for training a neural network, the computer program product comprising: one or more computer-readable tangible storage devices and program instructions stored on at least one of the one or more storage devices, the program instructions, when loaded and executed by a processor, cause an apparatus associated with the processor to: cause each agent of a plurality of agents to perform a pipelined gradient analysis to update respective local models of a neural network using respective subsets of data from a common pool of training data; and update a common global model of the neural network based upon the local models. 20 . The computer program product of claim 19 wherein the program instruction further cause the apparatus to cause each agent to perform the pipelined gradient analysis by: splitting the respective local models of the neural network into consecutive chunks; and assigning each chunk to a stage of a pipeline.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Supervised learning · CPC title

  • Feedforward networks · CPC title

  • Distributed learning, e.g. federated learning · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016267380A1 cover?
Training a neural network is a time consuming and computationally expensive task. Embodiments provide efficient methods and systems for neural network training One example embodiment is implemented by a plurality of agents, where each agent performs a pipelined gradient analysis to update respective local models of the neural network using respective subsets of data from a common pool of traini…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 15 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).