Deep neural network learning method and apparatus, and category-independent sub-network learning apparatus
US-2016110642-A1 · Apr 21, 2016 · US
US9715660B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9715660-B2 |
| Application number | US-201414230225-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 31, 2014 |
| Priority date | Nov 4, 2013 |
| Publication date | Jul 25, 2017 |
| Grant date | Jul 25, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising providing the deep neural network with a second training set and adjusting the values for a first subset of the plurality of weights, wherein the second training set includes data representing the key features of the one or more keywords or key phrases.
Opening claim text (preview).
What is claimed is: 1. A method comprising: training, by a speech recognition system that includes at least one computer, a deep neural network to determine probabilities that data received by the deep neural network has features similar to key features of words in a set of words, the training comprising: providing the deep neural network with a first set of feature values for uttered speech, and adjusting values for each of a plurality of weights included in the neural network; and training, by the speech recognition system, the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising: providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech, and adjusting the values for a first subset of the plurality of weights, wherein a first quantity of the words in the set of words is greater than a second quantity of the one or more keywords or key phrase, and the second set of feature values for uttered speech includes data representing the key features of the one or more keywords or key phrases and is a different set of feature values than the first set of feature values. 2. The method of claim 1 , wherein: the first training set has a first quantity of words; and the second training set has a second quantity of words that is less than the first quantity of words. 3. The method of claim 1 , comprising: maintaining, after training the deep neural network with the first training set, the values for a second subset of the plurality of weights constant while adjusting the values for the first subset of the plurality of weights, wherein the first subset of the plurality of weights and the second subset of the plurality of weights are disjoint subsets of the plurality of weights. 4. The method of claim 3 , wherein maintaining the values for a second subset of the plurality of weights constant while adjusting the values for the first subset of the plurality of weights comprises: maintaining the values of weights for a hidden layer in the deep neural network constant while adjusting the values of weights for an output layer in the deep neural network. 5. The method of claim 1 , wherein providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech comprises providing the deep neural network that was previously trained using a first set of feature values for uttered speech from a first language with the second set of feature values for uttered speech from a second language different than the first language. 6. The method of claim 1 , wherein providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech comprises providing the deep neural network that was previously trained using a first sets of features values for uttered speech from a particular language with the second set of values for uttered speech in the same particular language. 7. The method of claim 1 , comprising: providing, by the speech recognition system and after training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the deep neural network to a hotword detection system for use detecting only utterances of the one or more keywords or key phrases encoded in audio waveforms. 8. The method of claim 1 , comprising: using, by a hotword detection system and after training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the deep neural network to detect only utterances of the one or more keywords or key phrases encoded in an audio waveform. 9. The method of claim 8 , wherein using the deep neural network to detect only utterances of the one or more keywords or key phrases encoded in an audio waveform comprises: receiving, by the deep neural network, a feature vector that models an audio waveform; and generating, by the deep neural network, a probability for each of the keywords or key phrases using the feature vector and the values of the plurality of weights. 10. The method of claim 9 , wherein using the deep neural network to detect only utterances of the one or more keywords or key phrases encoded in an audio waveform comprises: generating a confidence score by combining two or more consecutive probabilities for the same keyword or key phrase, the consecutive probabilities corresponding with feature vectors that model different consecutive portions of an audio waveform; and determining whether the audio waveform included the keyword or the key phrase using the confidence score. 11. A speech recognition system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: training a deep neural network to determine probabilities that data received by the deep neural network has features similar to key features of words in a set of words, the training comprising: providing the deep neural network with a first set of feature values for uttered speech, and adjusting values for each of a plurality of weights included in the neural network; and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising: providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech, and adjusting the values for a first subset of the plurality of weights, wherein a first quantity of the words in the set of words is greater than a second quantity of the one or more keywords or key phrase, and the second set of feature values for uttered speech includes data representing the key features of the one or more keywords or key phrases and is a different set of feature values than the first set of feature values. 12. The system of claim 11 , wherein: the first training set has a first quantity of words; and the second training set has a second quantity of words that is less than the first quantity of words size. 13. The system of claim 11 , the operations comprising: maintaining, after training the deep neural network with the first training set, the values for a second subset of the plurality of weights constant while adjusting the values for the first subset of the plurality of weights, wherein the first subset of the plurality of weights and the second subset of the plurality of weights are disjoint subsets of the plurality of weights. 14. The system of claim 11 , wherein providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech comprises providing the deep neural network that was previously trained using a first set of feature values for uttered speech from a first language with the second set of feature values for uttered speech from a second language different than the first language. 15. The system of claim 11 , wherein providing the deep neural network that was previously trained using the first set of feature value
Related publications grouped by family.
Answers are generated from the same data shown on this page.