Implementing a whole sentence recurrent neural network language model for natural language processing
US-10692488-B2 · Jun 23, 2020 · US
US10839792B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10839792-B2 |
| Application number | US-201916267489-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 5, 2019 |
| Priority date | Feb 5, 2019 |
| Publication date | Nov 17, 2020 |
| Grant date | Nov 17, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method (and structure and computer product) for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system includes using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof. The AWE vector output from the AWE RNN is provided as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE→A2W NN) trained to provide an OOV word weight value from the AWE vector. The OOV word weight is inserted into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list.
Opening claim text (preview).
What is claimed is: 1. A method for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system, the method comprising: using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof; providing the AWE vector output from the AWE RNN as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE→A2W NN) trained to provide an OOV word weight value from the AWE vector; and inserting the OOV word weight into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list. 2. The method of claim 1 , wherein the AWE RNN is initially trained as an overall subnetwork using character sequences of In-Vocabulary (IV) words, wherein the initial training further involves an Acoustic Embedding Recurrent Neural Network (AE RNN) that receives an acoustic sequence correspondingly to each character sequence of an IV word used during training, wherein outputs of the AWE RNN and AE RNN are passed into a contrastive loss function, and wherein the AWE RNN and AWE→A2W NN are trained using a backpropagation algorithm to train weights of the AWE RNN, weights of the AE RNN, and weights of the AWE→A2W NN to minimize the contrastive loss function. 3. The method of claim 2 , wherein, subsequent to the initial training of the overall subnetwork, the AE RNN is not used for normal operation of the ASR system and only the AWE RNN is used for a subsequent introduction of OOV words into the ASR system. 4. The method of claim 1 , wherein the ASR system further comprises an Acoustic-to-Word Recurrent Neural Network (A2W RNN) that receives speech acoustic features as an input therein and an output of the A2W RNN is compared to embeddings of the A2W word embeddings listing using a dot product, and wherein, during a normal operation mode of the ASR system in which recognized words are output by the ASR system in response to speech acoustic features from an acoustic input into the ASR system, a word from the A2W word embeddings listing having a highest comparison result is provided as an output of the ASR system as a recognized word for the input speech acoustic features. 5. The method of claim 4 , wherein an overall subnetwork including the A2W RNN is trained using In-Vocabulary (IV) words, wherein speech acoustic features of an IV word and a word sequence corresponding to that IV word are provided into a loss function, and wherein a backpropagation algorithm updates weights of the A2W RNN in order to minimize this loss function and to provide the A2W word embeddings listing. 6. The method of claim 1 , as implemented in a cloud service. 7. A method for Automatic Speech Recognition (ASR), the method comprising: receiving a character sequence for an Out-of-Vocabulary (OOV) word into an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) of an ASR system, as a mechanism to receive a character sequence for a new OOV word for the ASR system, the AWE RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof; providing the AWE vector output from the AWE RNN as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE→A2W NN) trained to provide an OOV word weight value from the AWE vector; and inserting the OOV word weight into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list. 8. The method of claim 7 , wherein the AWE RNN is initially trained as an overall subnetwork using character sequences of In-Vocabulary (IV) words, wherein the initial training further involves an Acoustic Embedding Recurrent Neural Network (AE RNN) that receives an acoustic sequence correspondingly to each character sequence of an IV word used during training, wherein outputs of the AWE RNN and AE RNN are passed into a contrastive loss function, and wherein the AWE RNN and AWE→A2W NN are trained using a backpropagation algorithm to train weights of the AWE RNN, weights of the AE RNN, and weights of the AWE→A2W NN to minimize the contrastive loss function. 9. The method of claim 8 , wherein, subsequent to the initial training of the overall subnetwork, the AE RNN is not used for normal operation of the ASR system and only the AWE RNN is used for a subsequent introduction of OOV words into the ASR system. 10. The method of claim 7 , wherein the ASR system further comprises an Acoustic-to-Word Recurrent Neural Network (A2W RNN) that receives speech acoustic features as an input therein and an output of the A2W RNN is compared to embeddings of the A2W word embeddings listing using a dot product, and wherein, during a normal operation mode of the ASR system in which recognized words are output by the ASR system in response to speech acoustic features from an acoustic input into the ASR system, a word from the A2W word embeddings listing having a highest comparison result is provided as an output of the ASR system as a recognized word for the input speech acoustic features. 11. The method of claim 10 , wherein an overall subnetwork including the A2W RNN is trained using In-Vocabulary (IV) words, wherein speech acoustic features of an IV word and a word sequence corresponding to that IV word are provided into a loss function, and wherein a backpropagation algorithm updates weights of the A2W RNN in order to minimize this loss function and to provide the A2W word embeddings listing. 12. The method of claim 7 , as implemented in a cloud service. 13. A method for Automatic Speech Recognition (ASR), the method comprising: initially training an overall subnetwork comprising an Acoustic-to-Word Recurrent Neural Network (A2W RNN), the A2W RNN receiving In-Vocabulary (IV) words for the initial training, the initial training using IV words resulting in a listing of Acoustic-to-Word (A2W) Word Embeddings stored in a memory of an ASR system performing the ASR processing; receiving an Out-of-Vocabulary (OOV) word as a character sequence into an Acoustic Word Embedding Recurrent Neural Network (AWE RNN), as a mechanism to receive a character sequence for a new OOV word for the ASR system, the AWE RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof; providing the AWE vector output from the AWE RNN as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE→A2W NN) trained to provide an OOV word weight value from the AWE vector; and inserting the OOV word weight into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list. 14. The method of claim 13 , wherein the AWE RNN is initially trained as an overall subnetwork using character sequences of In-Vocabulary (IV) words, wherein the initial training further involves an Acoustic Embedding Recurrent Neural Network (AE RNN) that receives an acoustic sequence correspondingly to each character sequence of an IV word used during training, wherein outputs of the AWE RNN and AE RNN are passed into a contrastive loss function, an
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.