Speech recognition system and method using an adaptive incremental learning approach
US-2018151177-A1 · May 31, 2018 · US
US10909972B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10909972-B2 |
| Application number | US-201715805452-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 7, 2017 |
| Priority date | Nov 7, 2017 |
| Publication date | Feb 2, 2021 |
| Grant date | Feb 2, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An example apparatus for detecting intent in voiced audio includes a receiver to receive one or more word sequence hypotheses related to a voiced audio and a dynamic vocabulary. The apparatus also includes a natural language understander (NLU) to detect an intent and recognize a property related to the intent based on the word sequence hypothesis and the dynamic vocabulary. The apparatus further includes a transmitter to transmit the detected intent and recognized associated property to an application.
Opening claim text (preview).
What is claimed is: 1. An apparatus for detecting intent in voice audio, the apparatus comprising: a receiver to receive a common vocabulary including a list of static words and a parameter value from an application, the parameter value including a dynamic word to be added to a dynamic vocabulary including a set of relations between word sequences and semantic classes and a list of parameters that are to be used to detect dynamic vocabulary phrases; an automatic speech recognizer (ASR) to receive voiced audio and generate a word sequence hypothesis based on a language model including word probabilities derived from the dynamic vocabulary based on the semantic classes; a natural language understander (NLU) including: a feature front-end to generate a bag of features vector including a first sub vector including bag of words feature vector of distinguishing words derived from the common vocabulary based on weighted word counts in the word sequence hypothesis and a second sub vector including a feature vector of dynamic vocabulary detected in the word sequence hypothesis; an intent detector to detect an intent based on the bag of features vector; a property recognizer to compute a semantic tag for each word in the word sequence hypothesis based on the bag of features; and a transmitter to transmit the detected intent and a canonical representation generated based on the semantic tags to the application. 2. The apparatus of claim 1 , wherein the NLU includes a trained neural network to detect the intent based on the bag of features generated from the word sequence hypothesis. 3. The apparatus of claim 1 , wherein the feature front-end is to generate a set of continuous features based on a received common vocabulary and a set of discrete features based on the dynamic vocabulary, and generate the bag of features to be used to compute the semantic tags. 4. The apparatus of claim 1 , wherein the NLU includes a type caster to generate the canonical representation based on one or more words in the word sequence hypothesis with the semantic tags. 5. The apparatus of claim 1 , wherein the dynamic vocabulary is generated based on user data received from the application. 6. The apparatus of claim 1 , wherein the language model is trained using representative dynamic training data and updated with the parameter value from the application. 7. The apparatus of claim 1 , further including a semantic model communicatively coupled to the NLU, wherein the semantic model is trained using the dynamic vocabulary and updated with the parameter value from the application. 8. The apparatus of claim 1 , wherein the NLU includes a classifier trained to generate the intent based on the bag of features vector using a model trained by considering a subset of representative training data as dynamic. 9. The apparatus of claim 1 , wherein the property recognizer includes a model including a condition random field, a hidden Markov model or a recurrent neuronal network trained considering a sub-set of training data vocabulary as being dynamic. 10. The apparatus of claim 1 , wherein the intent detector includes a trained recurrent neural network or deep neural network trained considering a sub-set of training data vocabulary as being dynamic. 11. A method for detecting intent in voiced audio, the method comprising: receiving, via a processor, a common vocabulary including a list of static words and a parameter value from an application, the parameter value including a dynamic word to be added to a dynamic vocabulary including a set of relations between word sequences and semantic classes and a list of parameters used to detect dynamic vocabulary phrases; receiving, via the processor, voiced audio and generating a word sequence hypothesis based on a language model including word probabilities derived from the dynamic vocabulary based on the semantic classes; generating, via the processor, a bag of features vector including a first sub vector including bag of words feature vector of distinguishing words derived from the common vocabulary based on weighted word counts in the word sequence hypothesis and a second sub vector including a feature vector of dynamic vocabulary detected in the word sequence hypothesis; detecting, via the processor, an intent based on the bag of features vector; computing, via the processor, a semantic tag for each word in the word sequence hypothesis based on the bag of features; and sending, via the processor, the detected intent and a canonical representation generated based on the semantic tags to the application. 12. The method of claim 11 , wherein detecting the intent includes processing the bag of features using a model trained using representative dynamic training data. 13. The method of claim 11 , wherein generating the bag of features includes generating a set of continuous features based on the received common vocabulary, generating a set of discrete features based on the dynamic vocabulary, and generating the bag of features to be used to compute the semantic tags. 14. The method of claim 11 , wherein computing the semantic tags includes semantically tagging a word in the word sequence hypothesis based on a generated bag of features. 15. The method of claim 11 , further including generating a canonical representation based on one or more words in the word sequence hypothesis with the semantic tags. 16. The method of claim 11 , further including training a model to detect the intent, wherein training the model includes: receiving, via the processor, training data; randomly sampling, via the processor, the training data to generate common training data and representative dynamic training data; calculating, via the processor, the common vocabulary based on the common training data and the dynamic vocabulary based on the representative dynamic training data; and training, via the processor, the model based on the common training data, the representative dynamic training data, the common vocabulary, and the dynamic vocabulary. 17. The method of claim 11 , wherein detecting the intent and computing the semantic tags includes detecting a longer dynamic vocabulary before a shorter dynamic vocabulary. 18. The method of claim 11 , further including: receiving, via the processor, user data from the application: and generating the dynamic vocabulary based on the user data. 19. At least one non-transitory computer readable medium for detecting intent in voiced audio comprising instructions stored therein that, in response to being executed on a computing device, cause the computing device to at least: receive a common vocabulary including a list of static words and a parameter value from an application, the parameter value including a dynamic word to be added to a dynamic vocabulary including a set of relations between word sequences and semantic classes and a list of parameters that are to be used to detect dynamic vocabulary phrases; receive voiced audio and generating a word sequence hypothesis based on a language model including word probabilities derived from the dynamic vocabulary based on the semantic classes; generate a bag of features vector including a first sub vector including bag of words feature vector of distinguishing words derived from the common vocabulary based on weighted word counts in the word sequence hypothesis and a second sub vector including a feature vector of dynamic vocabulary detected in the word sequence hypothesis; detect an intent based on the bag of features vector; compute a semantic tag for ea
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Training · CPC title
updating or merging of old and new templates; Mean values; Weighting · CPC title
Formal grammars, e.g. finite state automata, context free grammars or word networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.