Asynchronous optimization for sequence training of neural networks

US10482873B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10482873-B2
Application numberUS-201815910720-A
CountryUS
Kind codeB2
Filing dateMar 2, 2018
Priority dateNov 4, 2013
Publication dateNov 19, 2019
Grant dateNov 19, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by one or more computing devices, the method comprising: obtaining multiple copies of a neural network of a speech model; asynchronously obtaining parameter values for the multiple copies of the neural network such that different copies of the neural network have different sets of parameter values; after obtaining the parameter values such that different copies of the neural network have different sets of parameter values, training the multiple copies of the neural network in parallel using different subsets of a set of training data, wherein training each copy of the neural network adjusts the parameter values for the copy of the neural network to generate adjusted parameter values; and updating the neural network of the speech model based on the adjusted parameter values generated for each of the multiple copies of the neural network. 2. The method of claim 1 , wherein training the multiple copies of the neural network in parallel comprises training the multiple copies of the neural network in parallel using stochastic gradient descent optimization. 3. The method of claim 2 , wherein training the multiple copies of the neural network in parallel using stochastic gradient descent optimization comprises training the multiple copies of the neural network using different approximations of an outer gradient for different copies of the neural network. 4. The method of claim 1 , wherein training the multiple copies of the neural network in parallel comprises performing sequence training of the multiple copies of the neural network independently and in parallel to reduce error across sequences of multiple frames. 5. The method of claim 1 , wherein updating the neural network of the speech model comprises: asynchronously providing the adjusted parameter values of the copies of the neural network to a server; and updating, by the server, parameter values of the neural network of the speech model in response to receiving each of multiple asynchronous sets of adjusted parameter values of the copies of the neural network. 6. The method of claim 1 , further comprising maintaining, at a server, a current set of parameter values for the neural network for the speech model by updating the current set of parameter values in response to each of multiple asynchronously-provided sets of adjusted parameter values of the multiple copies of the neural network. 7. The method of claim 6 , wherein asynchronously obtaining the parameter values of the multiple copies of the neural network such that different copies of the neural network have different sets of parameter values comprises: asynchronously obtaining, for each of the multiple copies of the neural network, the current set of parameters from the server at different times. 8. The method of claim 1 , wherein training the multiple copies of the neural network in parallel comprises: obtaining, by a first decoder associated with a first copy of the neural network, an utterance from the set of training data; determining, by the first decoder, a reference score associated with the utterance based on a set of parameter values obtained from a server; and training the first copy of the neural network using the reference score generated by the first decoder. 9. The method of claim 8 , wherein the first decoder and the first copy of the neural network use different sets of parameter values that are asynchronously obtained from the server at different times. 10. The method of claim 8 , wherein determining the reference score associated with the utterance comprises: obtaining, by the first decoder, the parameter values for the neural network; obtaining, by the first decoder, data indicating a true transcription of the utterance; and determining, by the first decoder, a reference lattice based on (i) the utterance, (ii) the obtained parameter values, and (iii) the data indicating a true transcription of the utterance. 11. The method of claim 1 , further comprising running multiple speech decoders in parallel during training of the multiple copies of the neural network, each of the multiple speech decoders corresponding to a copy of the neural network, wherein each of the speech decoders independently generates data indicating an outer gradient used in training the corresponding copy of the neural network. 12. The method of claim 1 , wherein training the multiple copies of the neural network in parallel comprises: obtaining an auxiliary function representing an approximation of a training objective function of a first copy of the neural network; and determining adjusted parameter values for the first copy of the neural network using the auxiliary function. 13. The method of claim 12 , wherein the training objective function is a maximum likelihood (ML) objective function, a maximum mutual information (MMI) objective function, a minimum phone error (MPE) objective function, or a state-level minimum Bayes risk objective function. 14. One or more non-transitory computer-readable media storing software that includes instructions, which, when executed by one or more computers, cause the one or more computers to perform operations comprising: obtaining multiple copies of a neural network of a speech model; asynchronously obtaining parameter values for the multiple copies of the neural network such that different copies of the neural network have different sets of parameter values; after obtaining the parameter values such that different copies of the neural network have different sets of parameter values, training the multiple copies of the neural network in parallel using different subsets of a set of training data, wherein training each copy of the neural network adjusts the parameter values for the copy of the neural network to generate adjusted parameter values; and updating the neural network of the speech model based on the adjusted parameter values generated for each of the multiple copies of the neural network. 15. The one or more non-transitory computer-readable media of claim 14 , wherein training the multiple copies of the neural network in parallel comprises training the multiple copies of the neural network in parallel using stochastic gradient descent optimization. 16. The one or more non-transitory computer-readable media of claim 15 , wherein training the multiple copies of the neural network in parallel using stochastic gradient descent optimization comprises training the multiple copies of the neural network using different approximations of an outer gradient for different copies of the neural network. 17. The one or more non-transitory computer-readable media of claim 14 , wherein training the multiple copies of the neural network in parallel comprises performing sequence training of the multiple copies of the neural network independently and in parallel to reduce error across sequences of multiple frames. 18. The one or more non-transitory computer-readable media of claim 14 , wherein updating the neural network of the speech model comprises: asynchronously providing the adjusted parameter values of the copies of the neural network to a server; and updating, by the server, parameter values of the neural network of the speech model in response to receiving each of multiple asynchronous sets of adjusted parameter values of the copies of the neural network. 19. A system comprising: one or more processors and one or more computer storage media storing instructions that are operable, when executed by the one or more processors, to cause the one or m

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • using artificial neural networks · CPC title

  • G10L15/063Primary

    Training · CPC title

  • using context dependencies, e.g. language models · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10482873B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training s…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 19 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).