Generating acoustic models

US9786270B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9786270-B2
Application numberUS-201615205263-A
CountryUS
Kind codeB2
Filing dateJul 8, 2016
Priority dateJul 9, 2015
Publication dateOct 10, 2017
Grant dateOct 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by one or more computers, the method comprising: obtaining, by the one or more computers, a first neural network trained as an acoustic model using connectionist temporal classification; obtaining, by the one or more computers, output distributions from the first neural network for an utterance, the output distributions comprising scores indicating likelihoods corresponding to different phonetic units; training, by the one or more computers, a second neural network as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network; and providing, by the one or more computers, an automated speech recognizer configured to use the trained second neural network to generate transcriptions for utterances. 2. The method of claim 1 , wherein providing an automated speech recognizer comprises: receiving audio data for an utterance; generating a transcription for the audio data using the trained second neural network; and providing the generated transcription for display. 3. The method of claim 1 , wherein providing an automated speech recognizer comprises providing the trained second neural network to another device for the performance of speech recognition by the other device. 4. The method of claim 1 , wherein the output distributions from the first neural network for the utterance are obtained using a first set of audio data for the utterance, and the second neural network is trained using a second set of audio data for the utterance, the second set of audio data having increased noise compared to the first set of training data. 5. The method of claim 1 , wherein training the second neural network as an acoustic model comprises: obtaining audio data for the utterance; adding noise to the audio data for the utterance to generate an altered version of the audio data; generating a sequence of input vectors based on the altered version of the audio data; and training the second neural network using output distributions produced by the first neural network as output targets corresponding to the sequence of input vectors generated based on the altered version of the audio data. 6. The method of claim 1 , wherein training the second neural network comprises training the second neural network with a loss function that uses two or more different output targets. 7. The method of claim 6 , wherein training the second neural network using the loss function comprises training the second neural network using a loss function that is a weighted combination of the two or more loss functions. 8. The method of claim 7 , wherein the weighted combination is a combination of (i) a first loss function that constrains the alignment of inputs and outputs, and (ii) a second loss function that does not constrain the alignment of inputs and outputs. 9. The method of claim 7 , wherein the two or more loss functions include at least two of a Baum-Welch loss function, a connectionist temporal classification loss function, and a Viterbi alignment loss function. 10. The method of claim 1 , wherein the second neural network has fewer parameters than the first neural network. 11. The method of claim 1 , wherein training the second neural network comprises training the second neural network to provide output distributions for the utterance that at least approximate the output distributions from the first neural network for the utterance. 12. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining, by the one or more computers, a first neural network trained as an acoustic model using connectionist temporal classification; obtaining, by the one or more computers, output distributions from the first neural network for an utterance, the output distributions comprising scores indicating likelihoods corresponding to different phonetic units; training, by the one or more computers, a second neural network as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network; and providing, by the one or more computers, an automated speech recognizer configured to use the trained second neural network to generate transcriptions for utterances. 13. The system of claim 12 , wherein providing an automated speech recognizer comprises: receiving audio data for an utterance; generating a transcription for the audio data using the trained second neural network; and providing the generated transcription for display. 14. The system of claim 12 , wherein providing an automated speech recognizer comprises providing the trained second neural network to another device for the performance of speech recognition by the other device. 15. The system of claim 12 , wherein the output distributions from the first neural network for the utterance are obtained using a first set of audio data for the utterance, and the second neural network is trained using a second set of audio data for the utterance, the second set of audio data having increased noise compared to the first set of audio data. 16. The system of claim 12 , wherein training the second neural network as an acoustic model comprises: obtaining audio data for the utterance; adding noise to the audio data for the utterance to generate an altered version of the audio data; generating a sequence of input vectors based on the altered version of the audio data; and training the second neural network using output distributions produced by the first neural network as output targets corresponding to the sequence of input vectors generated based on the altered version of the audio data. 17. One or more non-transitory computer-readable storage media storing with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: obtaining, by the one or more computers, a first neural network trained as an acoustic model using connectionist temporal classification; obtaining, by the one or more computers, output distributions from the first neural network for an utterance, the output distributions comprising scores indicating likelihoods corresponding to different phonetic units; training, by the one or more computers, a second neural network as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network; and providing, by the one or more computers, an automated speech recognizer configured to use the trained second neural network to generate transcriptions for utterances. 18. The one or more non-transitory computer-readable storage media of claim 17 , wherein providing an automated speech recognizer comprises: receiving audio data for an utterance; generating a transcription for the audio data using the trained second neural network; and providing the generated transcription for display. 19. The one or more non-transitory computer-readable storage media of claim 17 , wherein providing an automated speech recognizer comprises providing the trained second neural network to another device for the performance of speech recognition by the other device. 20. The one or more non-transitory computer-readable storage media of claim 17 , wherein the output distributions from the first n

Assignees

Inventors

Classifications

  • using artificial neural networks · CPC title

  • Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • G10L15/063Primary

    Training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9786270B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained a…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).