Language model customization in speech recognition for speech analytics
US-2017206890-A1 · Jul 20, 2017 · US
US10311859B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10311859-B2 |
| Application number | US-201615247656-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 25, 2016 |
| Priority date | Jan 16, 2016 |
| Publication date | Jun 4, 2019 |
| Grant date | Jun 4, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for extracting, from non-speech text, training data for a language model for speech recognition includes: receiving, by a processor, non-speech text; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected text to generate converted text comprising a plurality of phrases consistent with speech transcription text; training, by the processor, a language model using the converted text; and outputting, by the processor, the language model.
Opening claim text (preview).
What is claimed is: 1. A method for customizing a language model for speech recognition in a context, the method comprising: receiving, by a processor, non-speech text from the context, the context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text; customizing, by the processor, a language model for the context using the converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and outputting, by the processor, the language model. 2. The method of claim 1 , wherein the non-speech text comprises at least one from the group consisting of: an email; a forum post; a transcript of a text chat interaction; or a text message. 3. The method of claim 1 , wherein the converting the selected non-speech text comprises: removing metadata from the non-speech text; splitting the non-speech text into a plurality of sentences; converting one or more words of the sentences to spoken form; correcting one or more spelling errors in the sentences; identifying one or more duplicate sentences; and removing duplicate sentences. 4. The method of claim 1 , wherein the selecting the text comprises: for each in-vocabulary word in a lexicon of in-vocabulary words, identifying one or more sentences containing the in-vocabulary word; counting the one or more sentences to identify a count of the in-vocabulary word in the non-speech text; comparing the count to a first threshold; and adding the identified one or more sentences containing the in-vocabulary word in response to determining that the count satisfies the first threshold; identifying one or more out-of-vocabulary words comprising words that are in the sentences and not in the lexicon; for each out-of-vocabulary word of the out-of-vocabulary words: identifying one or more sentences containing the out-of-vocabulary word; counting the one or more sentences to identify a count of the out-of-vocabulary word in the non-speech text; comparing the count to a second threshold; computing a first likelihood of encountering the out-of-vocabulary word in the sentence among all of the identified sentences; identifying one or more spelling suggestions for the out-of-vocabulary word; computing a plurality of second likelihoods, each of the second likelihoods corresponding to a second likelihood of encountering each of the spelling suggestions in the sentence; adding the identified sentences to an output set of selected text in response to determining that the count satisfies a threshold and that at all of the second likelihoods are less than the first likelihood; and outputting the output set of selected text. 5. The method of claim 4 , wherein the computing the first likelihood comprises counting occurrences of the out-of-vocabulary word preceded by one or more history words in the non-speech text; and wherein the computing one of the second likelihoods comprises counting occurrences of a corresponding spelling suggestion of the spelling suggestions preceded by the one or more history words in the non-speech text. 6. A method for selecting, from non-speech text, training data for a language model for speech recognition, the method comprising: training, by a processor, a non-speech language model based on the non-speech text; for each unique sentence of the non-speech text: computing and normalizing, by the processor, an out-of-domain score of the unique sentence based on non-speech language model; computing and normalizing, by the processor, an in-domain score of the unique sentence based on a speech transcription language model trained based on generic speech transcription training data; comparing, by the processor, the out-of-domain score to the in-domain score; and adding, by the processor, the unique sentence to an output set of selected text in response to determining that the in-domain score exceeds the out-of-domain score by a threshold; and outputting, by the processor, the output set of selected text. 7. The method of claim 6 , further comprising scaling a count of each unique sentence in the output set by P(s), where: P ( s )= e IDScr′ where s is the unique sentence and where IDScr′ is the in-domain score of the unique sentence. 8. A method for selecting, from non-speech text, training data for a language model for speech recognition, the method comprising: initializing, by a processor, an output set of selected text based a plurality of sentences sampled from the non-speech text; for each unique sentence of the non-speech text: computing, by the processor, a first divergence between an in-domain language model trained on generic speech transcript text the unique sentence and a language model trained on the output set; computing, by the processor, a second divergence between the in-domain language model and a language model trained on the output set combined with the unique sentence; comparing, by the processor, the first divergence and the second divergence; and adding, by the processor, the sentence to the output set in response to determining that the second divergence in less than the first divergence; and outputting, by the processor, the output set of selected text. 9. The method of claim 8 , wherein the computing the second divergence comprises calculating a cross-entropy of the in-domain language model and the language model trained on the output set. 10. A system comprising: a processor; memory storing instructions that, when executed by the processor, cause the processor to: receive non-speech text from a context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications; select text from the non-speech text; convert the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text; customize a language model for the context using a converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and output the language model. 11. The system of claim 10 , wherein the non-speech text comprises-at least one from the group consisting of: an email; a forum post; a transcript of a text chat interaction; or a text message. 12. The system of claim 10 , wherein the memory further stores instructions that, when executed by the processor, cause the processor to convert the selected non-speech text by: removing metadata from the non-speech text; splitting the non-speech text into a plurality of sentences; converting one or more words of the sentences to spoken form; correcting one or more spelling errors in the sentences; identifying one or more duplicate sentences; and removing duplicate sentences. 13. The system of claim 10 , wherein the memory further stores instructions that, when executed by the processor, cause the processor to select the text by: for each in-vocabulary word in a lexicon of in-vocabulary words, identifying one or more sentences containing the in-vocabulary word; counting the one or more sentences to identify a count of the in-vocabulary word in the non-speech te
using context dependencies, e.g. language models · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
Threshold criteria for the updating · CPC title
Orthographic correction, e.g. spell checking or vowelisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.