Transfer learning for deep neural network based hotword detection

US9715660B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9715660-B2
Application numberUS-201414230225-A
CountryUS
Kind codeB2
Filing dateMar 31, 2014
Priority dateNov 4, 2013
Publication dateJul 25, 2017
Grant dateJul 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising providing the deep neural network with a second training set and adjusting the values for a first subset of the plurality of weights, wherein the second training set includes data representing the key features of the one or more keywords or key phrases.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: training, by a speech recognition system that includes at least one computer, a deep neural network to determine probabilities that data received by the deep neural network has features similar to key features of words in a set of words, the training comprising: providing the deep neural network with a first set of feature values for uttered speech, and adjusting values for each of a plurality of weights included in the neural network; and training, by the speech recognition system, the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising: providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech, and adjusting the values for a first subset of the plurality of weights, wherein a first quantity of the words in the set of words is greater than a second quantity of the one or more keywords or key phrase, and the second set of feature values for uttered speech includes data representing the key features of the one or more keywords or key phrases and is a different set of feature values than the first set of feature values. 2. The method of claim 1 , wherein: the first training set has a first quantity of words; and the second training set has a second quantity of words that is less than the first quantity of words. 3. The method of claim 1 , comprising: maintaining, after training the deep neural network with the first training set, the values for a second subset of the plurality of weights constant while adjusting the values for the first subset of the plurality of weights, wherein the first subset of the plurality of weights and the second subset of the plurality of weights are disjoint subsets of the plurality of weights. 4. The method of claim 3 , wherein maintaining the values for a second subset of the plurality of weights constant while adjusting the values for the first subset of the plurality of weights comprises: maintaining the values of weights for a hidden layer in the deep neural network constant while adjusting the values of weights for an output layer in the deep neural network. 5. The method of claim 1 , wherein providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech comprises providing the deep neural network that was previously trained using a first set of feature values for uttered speech from a first language with the second set of feature values for uttered speech from a second language different than the first language. 6. The method of claim 1 , wherein providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech comprises providing the deep neural network that was previously trained using a first sets of features values for uttered speech from a particular language with the second set of values for uttered speech in the same particular language. 7. The method of claim 1 , comprising: providing, by the speech recognition system and after training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the deep neural network to a hotword detection system for use detecting only utterances of the one or more keywords or key phrases encoded in audio waveforms. 8. The method of claim 1 , comprising: using, by a hotword detection system and after training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the deep neural network to detect only utterances of the one or more keywords or key phrases encoded in an audio waveform. 9. The method of claim 8 , wherein using the deep neural network to detect only utterances of the one or more keywords or key phrases encoded in an audio waveform comprises: receiving, by the deep neural network, a feature vector that models an audio waveform; and generating, by the deep neural network, a probability for each of the keywords or key phrases using the feature vector and the values of the plurality of weights. 10. The method of claim 9 , wherein using the deep neural network to detect only utterances of the one or more keywords or key phrases encoded in an audio waveform comprises: generating a confidence score by combining two or more consecutive probabilities for the same keyword or key phrase, the consecutive probabilities corresponding with feature vectors that model different consecutive portions of an audio waveform; and determining whether the audio waveform included the keyword or the key phrase using the confidence score. 11. A speech recognition system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: training a deep neural network to determine probabilities that data received by the deep neural network has features similar to key features of words in a set of words, the training comprising: providing the deep neural network with a first set of feature values for uttered speech, and adjusting values for each of a plurality of weights included in the neural network; and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising: providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech, and adjusting the values for a first subset of the plurality of weights, wherein a first quantity of the words in the set of words is greater than a second quantity of the one or more keywords or key phrase, and the second set of feature values for uttered speech includes data representing the key features of the one or more keywords or key phrases and is a different set of feature values than the first set of feature values. 12. The system of claim 11 , wherein: the first training set has a first quantity of words; and the second training set has a second quantity of words that is less than the first quantity of words size. 13. The system of claim 11 , the operations comprising: maintaining, after training the deep neural network with the first training set, the values for a second subset of the plurality of weights constant while adjusting the values for the first subset of the plurality of weights, wherein the first subset of the plurality of weights and the second subset of the plurality of weights are disjoint subsets of the plurality of weights. 14. The system of claim 11 , wherein providing the deep neural network that was previously trained using the first set of feature values with a second set of feature values for uttered speech comprises providing the deep neural network that was previously trained using a first set of feature values for uttered speech from a first language with the second set of feature values for uttered speech from a second language different than the first language. 15. The system of claim 11 , wherein providing the deep neural network that was previously trained using the first set of feature value

Assignees

Inventors

Classifications

  • G10L15/16Primary

    using artificial neural networks · CPC title

  • Word spotting · CPC title

  • G06N7/01Primary

    Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • Transfer learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9715660B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the dee…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).