Detecting customers with low speech recognition accuracy by investigating consistency of conversation in call-center
US-9870765-B2 · Jan 16, 2018 · US
US10089978B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10089978-B2 |
| Application number | US-201715823074-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 27, 2017 |
| Priority date | Jun 3, 2016 |
| Publication date | Oct 2, 2018 |
| Grant date | Oct 2, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and a system are provided for estimating automatic speech recognition (ASR) accuracy. A method includes obtaining transcriptions of utterances in a conversation over two channels. The method further includes sorting the transcriptions along a time axis using a forced alignment. The method also includes training a language model with the sorted transcriptions. The method additionally includes performing ASR for utterances in a conversation between a first user and a second user. The second user is a target of ASR accuracy estimation. The method further includes determining whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model. The method also includes estimating the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user.
Opening claim text (preview).
What is claimed is: 1. A method for estimating automatic speech recognition (ASR) accuracy, the method comprising: obtaining transcriptions of utterances in a conversation over two channels; sorting the transcriptions along a time axis using a forced alignment; training a language model with the sorted transcriptions; performing ASR for utterances in a conversation between a first user and a second user, the second user being a target of ASR accuracy estimation; determining whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model; listing word sequences that have at least one word from the first user followed by a word from the second user; and estimating the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user. 2. The method of claim 1 , wherein said obtaining, sorting, and training steps correspond to a training stage of the method, and said performing, determining, and estimating steps correspond to a call selection stage of the method. 3. The method of claim 1 , wherein said obtaining step obtains the transcriptions of the utterances between a plurality of caller-callee pairs. 4. The method of claim 3 , wherein a respective one of the transcriptions is obtained for each participant in each of the plurality of caller-callee pairs. 5. The method of claim 1 , wherein said sorting step adds a time index to each of the words in each of the transcriptions. 6. The method of claim 5 , wherein, for a given pair of participants in the conversation over the two channels, said sorting step merges the transcriptions for the given pair from the two channels by sorting the words in each of the transcriptions corresponding to the given pair according to the time index. 7. The method of claim 6 , wherein said training step trains the language model using the merged transcriptions. 8. The method of claim 1 , wherein said performing step performs ASR separately for the utterances of the first user and the utterances of the second user to obtain the ASR result of the first user and the ASR result of the second user, and merges the ASR result of the first user with the ASR result of the second user in a time-aligned manner. 9. The method of claim 1 , wherein said determining step determines whether the ASR results of the first and second users are consistent or inconsistent by comparing an average of probabilities of the word from the second user across each of the word sequences against a threshold. 10. The method of claim 9 , wherein the at least one word from the first user consists of two words, and the probabilities are based on tri-grams formed from the two words from the first user and the word from the second user that follows. 11. The method of claim 9 , wherein the average of the probabilities is used as a consistency index. 12. The method of claim 1 , wherein determining whether the ASR result of the second user is consistent with the ASR result of the first user includes extracting, from the ASR for each of the utterances in the conversation between the first user and the second user, a word sequence that has a word from the first user followed by a word from the second user and calculating a probability of the word from the second user as a consistency index using the language model. 13. The method of claim 12 , further comprising: listing a plurality of word sequences from a conversation between the first user and the second user; and calculating an average probability of a last word from the second user in each of the plurality of word sequences as a consistency index using the language model. 14. The method of claim 1 , wherein the method is performed by an automatic speech recognition system. 15. The method of claim 1 , further comprising retraining the language model with any of the transcriptions from the second user being removed for the retraining to improve an accuracy of the language model, responsive to the ASR result of the second user being estimated as poor. 16. A non-transitory computer readable storage medium comprising a computer readable program for estimating automatic speech recognition (ASR) accuracy, wherein the computer readable program when executed on a computer causes the computer to perform the steps of: obtaining transcriptions of utterances in a conversation over two channels; sorting the transcriptions along a time axis using a forced alignment; training a language model with the sorted transcriptions; performing ASR for utterances in a conversation between a first user and a second user, the second user being a target of ASR accuracy estimation; determining whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model; listing word sequences that have at least one word from the first user followed by a word from the second user; and estimating the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user. 17. The non-transitory computer readable storage medium of claim 16 , wherein said sorting step adds a time index to each of the words in each of the transcriptions, and wherein, for a given pair of participants in the conversation over the two channels, said sorting step merges the transcriptions for the given pair from the two channels by sorting the words in each of the transcriptions corresponding to the given pair according to the time index. 18. The non-transitory computer readable storage medium of claim 16 , wherein said determining step determines whether the ASR results of the first and second users are consistent or inconsistent by comparing an average of probabilities of the word from the second user across each of the word sequences against a threshold. 19. A system for estimating automatic speech recognition (ASR) accuracy, the system comprising: a processor, configured to: obtain transcriptions of utterances in a conversation over two channels; sort the transcriptions along a time axis using a forced alignment; train a language model with the sorted transcriptions; perform ASR for utterances in a conversation between a first user and a second user, the second user being a target of ASR accuracy estimation; determine whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model; list word sequences that have at least one word from the first user followed by a word from the second user; and estimate the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user. 20. The system of claim 19 , wherein said processor step determines whether the ASR results of the first and second users are consistent or inconsistent by comparing an average of probabilities of the word from the second user across each of the word sequences against a threshold.
Assessment or evaluation of speech recognition systems · CPC title
Training · CPC title
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.