Method and device for parallel processing in model training

US9508347B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9508347-B2
Application numberUS-201314108237-A
CountryUS
Kind codeB2
Filing dateDec 16, 2013
Priority dateJul 10, 2013
Publication dateNov 29, 2016
Grant dateNov 29, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of training a Deep Neural Network (DNN) model, comprising: at a head computing device comprising one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein a training processing unit is provided by a respective leaf computing device of a plurality of leaf computing devices operating in parallel, the plurality of leaf computing devices being coupled to the head computing device, and wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model of a plurality of DNN sub-models based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either a respective initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition, and wherein merging the respective DNN sub-models comprises: for each DNN sub-model of the plurality of DNN sub-models, assigning a respective first merging weight to each layer of a plurality of layers of each DNN sub-model; and for the plurality of DNN sub-models, applying a linear combination of the respective DNN sub-models to obtain the intermediate DNN model, wherein applying the linear combination comprises: assigning a respective second merging weight to each DNN sub-model of the plurality of DNN sub-models, wherein the respective second merging weight of each DNN sub-model is a vector of the respective first merging weights of all layers of the respective DNN sub-model, wherein the initial and final DNN models are acoustic models of speech recognition and the training data corpus comprises a plurality of randomized speech files. 2. The method of claim 1 , wherein merging the respective DNN sub-models generated by the plurality of training processing units further comprises: using a respective shared first merging weight for all layers of each DNN sub-model during the merging. 3. This method of claim 1 , further comprising: identifying a plurality of decoding processing units operating in parallel, each decoding processing units utilizing a respective final DNN model; providing a same test sample to each of the plurality of decoding processing units operating in parallel, wherein each decoding processing unit generates a respective posterior probability sequence for the same test sample based on the respective final DNN model of the decoding processing unit; and merging the respective posterior probability sequences generated by the plurality of decoding processing units to obtain a recognition result for the same test sample. 4. The method of claim 3 , wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises: using a respective shared merging weight for all phoneme binding states of each respective posterior probability sequence during the merging of the respective posterior probability sequences generated by the plurality of decoding processing units. 5. The method of claim 3 , wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises: using a respective merging weight for each phoneme binding state of each DNN sub-model during the merging of the respective posterior probability sequences generated by the plurality of decoding processing units. 6. A system for training a Deep Neural Network (DNN) model, comprising: a head computing device; and a plurality of leaf computing devices operating in parallel and coupled to the head computing device, wherein head computing device comprises: one or more processors; and memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform operations comprising: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein a training processing unit is provided by a respective leaf computing device of the plurality of leaf computing devices operating in parallel, and wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model of a plurality of DNN sub-models based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either a respective initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition, and wherein merging the respective DNN sub-models comprises: for each DNN sub-model of the plurality of DNN sub-models, assigning a respective first merging weight to each layer of a plurality of layers of each DNN sub-model; and for the plurality of DNN sub-models, applying a linear combination of the respective DNN sub-models to obtain the intermediate DNN model, wherein applying the linear combination comprises: assigning a respective second merging weight to each DNN sub-model of the plurality of DNN sub-models, wherein the respective second merging weight of each DNN sub-model is a vector of the respective first merging weights of all layers of the respective DNN sub-model, wherein the initial and final DNN models are acoustic models of speech recognition and the training data corpus comprises a plurality of randomized speech files. 7. The system of claim 6 , wherein merging the respective DNN sub-models generated by the plurality of training processing units further comprises: using a respective shared first merging weight for all layers of each DNN sub-model during the merging. 8. The system of claim 6 , wherein the operations further comprise: identifying a plurality of decoding processing units operating in parallel, each decoding processing units utilizing a respective final DNN model; providing a same test sample to each of the plurality of decoding processing units operating in parallel, wherein each decoding processing unit generates a respective posterior probability sequence for the same test sample based on the respective final DNN model of the decoding processing unit; and merging the respective posterior probability sequences generated by the plurality of decoding processing units to obtain a recognition result for the same test sample. 9. The system of claim 8 , wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises: using a respective shared merging weight for all phoneme binding states of each respective posterior probability sequence during the merging of the respective posterior probability sequences generated by the plurality of decoding processing units. 10. The system of claim 8 , wherein merging respective posterior probability sequences generated by the plurality of decoding processing units further comprises: using a respective merging weight for each phoneme binding state of each DNN sub-model during the merging of the resp

Assignees

Inventors

Classifications

  • using artificial neural networks · CPC title

  • G10L15/34Primary

    Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing · CPC title

  • Training · CPC title

  • Neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9508347B2 cover?
A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating …
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/34. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).