Material selection for language model customization in speech recognition for speech analytics

US10311859B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10311859-B2
Application numberUS-201615247656-A
CountryUS
Kind codeB2
Filing dateAug 25, 2016
Priority dateJan 16, 2016
Publication dateJun 4, 2019
Grant dateJun 4, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for extracting, from non-speech text, training data for a language model for speech recognition includes: receiving, by a processor, non-speech text; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected text to generate converted text comprising a plurality of phrases consistent with speech transcription text; training, by the processor, a language model using the converted text; and outputting, by the processor, the language model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for customizing a language model for speech recognition in a context, the method comprising: receiving, by a processor, non-speech text from the context, the context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text; customizing, by the processor, a language model for the context using the converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and outputting, by the processor, the language model. 2. The method of claim 1 , wherein the non-speech text comprises at least one from the group consisting of: an email; a forum post; a transcript of a text chat interaction; or a text message. 3. The method of claim 1 , wherein the converting the selected non-speech text comprises: removing metadata from the non-speech text; splitting the non-speech text into a plurality of sentences; converting one or more words of the sentences to spoken form; correcting one or more spelling errors in the sentences; identifying one or more duplicate sentences; and removing duplicate sentences. 4. The method of claim 1 , wherein the selecting the text comprises: for each in-vocabulary word in a lexicon of in-vocabulary words, identifying one or more sentences containing the in-vocabulary word; counting the one or more sentences to identify a count of the in-vocabulary word in the non-speech text; comparing the count to a first threshold; and adding the identified one or more sentences containing the in-vocabulary word in response to determining that the count satisfies the first threshold; identifying one or more out-of-vocabulary words comprising words that are in the sentences and not in the lexicon; for each out-of-vocabulary word of the out-of-vocabulary words: identifying one or more sentences containing the out-of-vocabulary word; counting the one or more sentences to identify a count of the out-of-vocabulary word in the non-speech text; comparing the count to a second threshold; computing a first likelihood of encountering the out-of-vocabulary word in the sentence among all of the identified sentences; identifying one or more spelling suggestions for the out-of-vocabulary word; computing a plurality of second likelihoods, each of the second likelihoods corresponding to a second likelihood of encountering each of the spelling suggestions in the sentence; adding the identified sentences to an output set of selected text in response to determining that the count satisfies a threshold and that at all of the second likelihoods are less than the first likelihood; and outputting the output set of selected text. 5. The method of claim 4 , wherein the computing the first likelihood comprises counting occurrences of the out-of-vocabulary word preceded by one or more history words in the non-speech text; and wherein the computing one of the second likelihoods comprises counting occurrences of a corresponding spelling suggestion of the spelling suggestions preceded by the one or more history words in the non-speech text. 6. A method for selecting, from non-speech text, training data for a language model for speech recognition, the method comprising: training, by a processor, a non-speech language model based on the non-speech text; for each unique sentence of the non-speech text: computing and normalizing, by the processor, an out-of-domain score of the unique sentence based on non-speech language model; computing and normalizing, by the processor, an in-domain score of the unique sentence based on a speech transcription language model trained based on generic speech transcription training data; comparing, by the processor, the out-of-domain score to the in-domain score; and adding, by the processor, the unique sentence to an output set of selected text in response to determining that the in-domain score exceeds the out-of-domain score by a threshold; and outputting, by the processor, the output set of selected text. 7. The method of claim 6 , further comprising scaling a count of each unique sentence in the output set by P(s), where: P ( s )= e IDScr′ where s is the unique sentence and where IDScr′ is the in-domain score of the unique sentence. 8. A method for selecting, from non-speech text, training data for a language model for speech recognition, the method comprising: initializing, by a processor, an output set of selected text based a plurality of sentences sampled from the non-speech text; for each unique sentence of the non-speech text: computing, by the processor, a first divergence between an in-domain language model trained on generic speech transcript text the unique sentence and a language model trained on the output set; computing, by the processor, a second divergence between the in-domain language model and a language model trained on the output set combined with the unique sentence; comparing, by the processor, the first divergence and the second divergence; and adding, by the processor, the sentence to the output set in response to determining that the second divergence in less than the first divergence; and outputting, by the processor, the output set of selected text. 9. The method of claim 8 , wherein the computing the second divergence comprises calculating a cross-entropy of the in-domain language model and the language model trained on the output set. 10. A system comprising: a processor; memory storing instructions that, when executed by the processor, cause the processor to: receive non-speech text from a context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications; select text from the non-speech text; convert the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text; customize a language model for the context using a converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and output the language model. 11. The system of claim 10 , wherein the non-speech text comprises-at least one from the group consisting of: an email; a forum post; a transcript of a text chat interaction; or a text message. 12. The system of claim 10 , wherein the memory further stores instructions that, when executed by the processor, cause the processor to convert the selected non-speech text by: removing metadata from the non-speech text; splitting the non-speech text into a plurality of sentences; converting one or more words of the sentences to spoken form; correcting one or more spelling errors in the sentences; identifying one or more duplicate sentences; and removing duplicate sentences. 13. The system of claim 10 , wherein the memory further stores instructions that, when executed by the processor, cause the processor to select the text by: for each in-vocabulary word in a lexicon of in-vocabulary words, identifying one or more sentences containing the in-vocabulary word; counting the one or more sentences to identify a count of the in-vocabulary word in the non-speech te

Assignees

Inventors

Classifications

  • using context dependencies, e.g. language models · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Threshold criteria for the updating · CPC title

  • Orthographic correction, e.g. spell checking or vowelisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10311859B2 cover?
A method for extracting, from non-speech text, training data for a language model for speech recognition includes: receiving, by a processor, non-speech text; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected text to generate converted text comprising a plurality of phrases consistent with speech transcription text; training, by the processor…
Who is the assignee on this patent?
Genesys Telecommunications Laboratories Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 04 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).