Incremental learner via an adaptive mixture of weak learners distributed on a non-rigid binary tree
US-2016189058-A1 · Jun 30, 2016 · US
US2016180214A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016180214-A1 |
| Application number | US-201414577301-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 19, 2014 |
| Priority date | Dec 19, 2014 |
| Publication date | Jun 23, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. One of the methods includes training a neural network using sharp discrepancy learning by providing training data to the neural network, calculating a gradient using a sharp discrepancy output layer objective function to classify the neural network parameters for correct and incorrect network model states, and training the neural network using the gradient to determine a probability that data received by the neural network has features similar to key features of one or more keywords or key phrases.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: providing training data to a neural network that includes an output layer and one or more hidden layers, each of the hidden layers comprising multiple nodes and corresponding parameters; calculating a gradient for the neural network by applying a sharp discrepancy output layer objective function to the output layer, wherein the sharp discrepancy output layer objective function is dependent on the training data and parameters; training the neural network using the gradient to determine a probability that data received by the neural network has features similar to key features of one or more keywords or key phrases, wherein training the neural network using the gradient comprises using the gradient to update the parameters. 2 . The method of claim 1 , comprising providing the trained neural network for use in a speech recognition system, wherein the speech recognition system uses sharp discrepancy learning on real data. 3 . The method of claim 1 , wherein calculating the gradient for the neural network by applying a sharp discrepancy output layer objective function to the output layer comprises calculating the gradient of a cross-entropy function. 4 . The method of claim 1 , wherein the sharp discrepancy output layer objective function comprises a class of sharp discrepancy objective functions with a fraction whose denominator is a product of shifted label scores over a set of labels that correspond to a set of states that are designated as incorrect states. 5 . The method of claim 4 , wherein the label scores each comprise an exponential of a product of a label, parameter matrix and training data point. 6 . The method of claim 4 , wherein the class of sharp discrepancy objective functions comprise functions with a fraction whose numerator is a non-negative label score associated with a state that is designated as a correct state. 7 . The method of claim 1 , wherein calculating the gradient comprises calculating each component of the gradient separately. 8 . The method of claim 1 , wherein calculating the gradient comprises calculating each component of the gradient in parallel. 9 . The method of claim 1 , wherein the neural network comprises a deep neural network. 10 . The method of claim 1 , wherein the neural network comprises a deep belief network. 11 . The method of claim 1 , wherein the training data comprises a plurality of feature vectors and a plurality of label vectors that each indicate whether the corresponding feature vector corresponds to i) one of the keywords or key phrases, or ii) not. 12 . The method of claim 11 , wherein each of the plurality of feature vectors represent a different portion of an audio waveform from a received digital representation of speech. 13 . The method of claim 12 , wherein the digital representation of speech comprises recorded speech data. 14 . The method of claim 11 , wherein each of the plurality of label vectors corresponds to one of the feature vectors, and specifies a probability distribution for whether the corresponding feature vector corresponds to i) one of the keywords or key phrases, or ii) not. 15 . The method of claim 14 , wherein the probability distribution comprises a multinomial distribution. 16 . The method of claim 1 , wherein training the neural network using the gradient comprises iterating the parameter updates until an end criteria is met. 17 . The method of claim 1 , comprising calculating, using the hidden layers, an exponential of a product of a value of one of the parameters and a point from the training data. 18 . A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: providing training data to a neural network that includes an output layer and one or more hidden layers, each of the hidden layers comprising multiple nodes and corresponding parameters; calculating a gradient for the neural network by applying a sharp discrepancy output layer objective function to the output layer, wherein the sharp discrepancy output layer objective function is dependent on the training data and parameters; training the neural network using the gradient to determine a probability that data received by the neural network has features similar to key features of one or more keywords or key phrases, wherein training the neural network using the gradient comprises using the gradient to update the parameters. 19 . The system of claim 18 , wherein the sharp discrepancy output layer objective function comprises a class of sharp discrepancy objective functions with a fraction whose denominator is a product of shifted label scores over a set of labels that correspond to a set of states that are designated as incorrect states. 20 . A computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: providing training data to a neural network that includes an output layer and one or more hidden layers, each of the hidden layers comprising multiple nodes and corresponding parameters; calculating a gradient for the neural network by applying a sharp discrepancy output layer objective function to the output layer, wherein the sharp discrepancy output layer objective function is dependent on the training data and parameters; training the neural network using the gradient to determine a probability that data received by the neural network has features similar to key features of one or more keywords or key phrases, wherein training the neural network using the gradient comprises using the gradient to update the parameters.
Combinations of networks · CPC title
Learning methods · CPC title
Feedforward networks · CPC title
Supervised learning · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.