Method and system for training language models to reduce recognition errors

US10176799B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10176799-B2
Application numberUS-201615013239-A
CountryUS
Kind codeB2
Filing dateFeb 2, 2016
Priority dateFeb 2, 2016
Publication dateJan 8, 2019
Grant dateJan 8, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and for training a language model to reduce recognition errors, wherein the language model is a recurrent neural network language model (RNNLM) by first acquiring training samples. An automatic speech recognition system (ASR) is applied to the training samples to produce recognized words and probabilites of the recognized words, and an N-best list is selected from the recognized words based on the probabilities. determining word errors using reference data for hypotheses in the N-best list. The hypotheses are rescored using the RNNLM. Then, we determine gradients for the hypotheses using the word errors and gradients for words in the hypotheses. Lastly, parameters of the RNNLM are updated using a sum of the gradients.

First claim

Opening claim text (preview).

We claim: 1. A method for speech recognition to reduce recognition errors using a language model, wherein the language model is a recurrent neural network language model (RNNLM) that is in communication with a Long Short-Term Memory (LSTM), comprising the steps of: acquiring training samples during a training stage for training the RNNLM to perform applying an automatic speech recognition system (ASR) to the training samples to produce recognized words and probabilites of the recognized words; selecting an N-best list from the recognized words based on the probabilities; determining word errors using reference data for hypotheses in the N-best list; rescoring the hypotheses using the RNNLM in communication with the LSTM; determining gradients for the hypotheses using the word errors, wherein the determined gradients for the hypotheses corresponds to differences with respect to the N-best hypothesis scores; determining gradients for recognized words in the hypotheses; back-propagating the gradients; updating parameters of the RNNLM using a sum of the gradients as an error signal for the RNNLM, so as to the reduce recognition errors of the ASR; acquiring spoken utterances as an input to the RNNLM to produce the recognized words; producing the N-best list from the recognized words; and applying the RNNLM to the N-best list to obtain recognition results, wherein the steps are performed in a processor. 2. The method of claim 1 , wherein a stochastic gradient descent method is applied on an utterance-by-utterance basis so that the gradients are accumulated over the N-best list. 3. The method of claim 1 , wherein an output vector y t ∈[0,1] |V|+|C| (|C|, is a number of classes, includes of word (w) and class (c) outputs y t = [ y t ( w ) y t ( c ) ] , obtained as y t,m (w) =ζ( W ho,m (w) h t ), and y t (c) =ζ( W ho (c) h t ), where y t,m (w) and are sub-vector of y t (w) and sub-matrix of W ho corresponding to the words in an m-th class, respectively, and W ho (c) is a sub-matrix of W ho for the class output, where W ho is a matrix placed between a hidden layer and the output layer of the RNNLM, h t is a D dimensional activation vector h t ∈[0,1] D in a hidden layer, and ζ(⋅) denotes a softmax function that determines a softmax for elements of the vectors. 4. The method of claim 3 , wherein a word occurrence probability is P ( w t |h t )≡ y t,C(w t ) (w) [w t ]×y t (c) [C ( w t )] where C(w) denotes an index of the class to which the word w belongs. 5. The method of claim 4 , wherein a loss function of minimum word error training is L ⁡ ( Λ ) = ∑ k = 1 K ⁢ ∑ W ∈ V * ⁢ E ⁡ ( W k ( R ) , W ) ⁢ P Λ ⁡ ( W ❘ O k ) , where Λ is a set of model parameters, K is the number of utterances in training data, O k is a k-th acoustic observation sequence, and W k (R) ={w k,1 (R) , . . . , w k,T k (R) } is a k-th reference word sequence, E(W′,W) represents an edit distance between two word sequences W′ and W, and P Λ (W|O) is a posterior probability of W determined with the set of model parameter Λ. 6. The method of claim 5 , further comprising: obtaining, the the N-best lists and obtain a loss function ⁢ L ⁡ ( Λ ) = ∑ k = 1 K ⁢ ∑ N n = 1 ⁢ E ⁡ ( W k (

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Speech classification or search · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10176799B2 cover?
A method and for training a language model to reduce recognition errors, wherein the language model is a recurrent neural network language model (RNNLM) by first acquiring training samples. An automatic speech recognition system (ASR) is applied to the training samples to produce recognized words and probabilites of the recognized words, and an N-best list is selected from the recognized words …
Who is the assignee on this patent?
Mitsubishi Electric Res Laboratories Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).