Deep neural networks training for speech and pattern recognition
US-9477925-B2 · Oct 25, 2016 · US
US2016267380A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016267380-A1 |
| Application number | US-201514657414-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 13, 2015 |
| Priority date | Mar 13, 2015 |
| Publication date | Sep 15, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Training a neural network is a time consuming and computationally expensive task. Embodiments provide efficient methods and systems for neural network training One example embodiment is implemented by a plurality of agents, where each agent performs a pipelined gradient analysis to update respective local models of the neural network using respective subsets of data from a common pool of training data. In turn, a common global model of the neural network is updated based upon the local models.
Opening claim text (preview).
What is claimed is: 1 . A method of training a neural network, the method comprising: by each agent of a plurality of agents, performing a pipelined gradient analysis to update respective local models of a neural network using respective subsets of data from a common pool of training data; and updating a common global model of the neural network based upon the local models. 2 . The method of claim 1 wherein performing the pipelined gradient analysis comprises: splitting the respective local models of the neural network into consecutive chunks; and assigning each chunk to a stage of a pipeline. 3 . The method of claim 1 wherein each stage of the pipeline is associated with a graphics processing unit (GPU). 4 . The method of claim 1 wherein performing the pipelined gradient analysis further comprises: selecting the subsets of data from the common pool of training data according to a focused-attention back-propagation (FABP) strategy. 5 . The method of claim 1 further including an initialization procedure comprising: by a single agent of the plurality of agents: performing the pipelined gradient analysis to update its respective local model of the neural network using a respective subset of data from the common pool of training data; and updating the common global model of the neural network based upon its local model. 6 . The method of claim 1 wherein the common global model is owned by a single agent of the plurality of agents at any one time according to a locking mechanism. 7 . The method of claim 6 wherein the common global model is updated by the single agent during a period in which the single agent owns the common global model. 8 . The method of claim 1 wherein a critical section is reached when an agent of the plurality is ready to update the global model and the agent of the plurality that is ready to update the global model does not own the global model. 9 . The method of claim 8 wherein the agent that is ready to update the global model requests the global model. 10 . A computer system for training a neural network, the computer system comprising: a processor; and a memory with computer code instructions stored thereon, the processor and the memory, with the computer code instructions being configured to cause the system to: by each agent of a plurality of agents, perform a pipelined gradient analysis to update respective local models of a neural network using respective subsets of data from a common pool of training data; and update a common global model of the neural network based upon the local models. 11 . The computer system of claim 10 , wherein, in performing the pipelined gradient analysis, the processor and the memory, with the computer code instructions, are further configured to cause the system to: split the respective local models of the neural network into consecutive chunks; and assign each chunk to a stage of a pipeline. 12 . The computer system of claim 10 wherein each stage of the pipeline is associated with a graphics processing unit (GPU). 13 . The computer system of claim 10 , wherein, in performing the pipelined gradient analysis, the processor and the memory, with the computer code instructions, are further configured to cause the system to: select the subsets of data from the common pool of training data according to a focused-attention back-propagation (FABP) strategy. 14 . The computer system of claim 10 , wherein the processor and the memory, with the computer code instructions, are further configured to implement an initialization procedure that causes the system to: by a single agent of the plurality of agents: perform the pipelined gradient analysis to update its respective local model of the neural network using a respective subset of data from the common pool of training data; and update the common global model of the neural network based upon its local model. 15 . The computer system of claim 10 wherein the common global model is owned by a single agent of the plurality of agents at any one time according to a locking mechanism. 16 . The computer system of claim 15 wherein the common global model is updated by the single agent during a period in which the single agent owns the common global model. 17 . The computer system of claim 10 wherein a critical section is reached when an agent of the plurality is ready to update the global model and the agent of the plurality that is ready to update the global model does not own the global model. 18 . The computer system of claim 17 wherein the agent that is ready to update the global model requests the global model. 19 . A computer program product for training a neural network, the computer program product comprising: one or more computer-readable tangible storage devices and program instructions stored on at least one of the one or more storage devices, the program instructions, when loaded and executed by a processor, cause an apparatus associated with the processor to: cause each agent of a plurality of agents to perform a pipelined gradient analysis to update respective local models of a neural network using respective subsets of data from a common pool of training data; and update a common global model of the neural network based upon the local models. 20 . The computer program product of claim 19 wherein the program instruction further cause the apparatus to cause each agent to perform the pipelined gradient analysis by: splitting the respective local models of the neural network into consecutive chunks; and assigning each chunk to a stage of a pipeline.
Combinations of networks · CPC title
Supervised learning · CPC title
Feedforward networks · CPC title
Distributed learning, e.g. federated learning · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.