Detecting customers with low speech recognition accuracy by investigating consistency of conversation in call-center

US10089978B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10089978-B2
Application numberUS-201715823074-A
CountryUS
Kind codeB2
Filing dateNov 27, 2017
Priority dateJun 3, 2016
Publication dateOct 2, 2018
Grant dateOct 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and a system are provided for estimating automatic speech recognition (ASR) accuracy. A method includes obtaining transcriptions of utterances in a conversation over two channels. The method further includes sorting the transcriptions along a time axis using a forced alignment. The method also includes training a language model with the sorted transcriptions. The method additionally includes performing ASR for utterances in a conversation between a first user and a second user. The second user is a target of ASR accuracy estimation. The method further includes determining whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model. The method also includes estimating the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for estimating automatic speech recognition (ASR) accuracy, the method comprising: obtaining transcriptions of utterances in a conversation over two channels; sorting the transcriptions along a time axis using a forced alignment; training a language model with the sorted transcriptions; performing ASR for utterances in a conversation between a first user and a second user, the second user being a target of ASR accuracy estimation; determining whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model; listing word sequences that have at least one word from the first user followed by a word from the second user; and estimating the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user. 2. The method of claim 1 , wherein said obtaining, sorting, and training steps correspond to a training stage of the method, and said performing, determining, and estimating steps correspond to a call selection stage of the method. 3. The method of claim 1 , wherein said obtaining step obtains the transcriptions of the utterances between a plurality of caller-callee pairs. 4. The method of claim 3 , wherein a respective one of the transcriptions is obtained for each participant in each of the plurality of caller-callee pairs. 5. The method of claim 1 , wherein said sorting step adds a time index to each of the words in each of the transcriptions. 6. The method of claim 5 , wherein, for a given pair of participants in the conversation over the two channels, said sorting step merges the transcriptions for the given pair from the two channels by sorting the words in each of the transcriptions corresponding to the given pair according to the time index. 7. The method of claim 6 , wherein said training step trains the language model using the merged transcriptions. 8. The method of claim 1 , wherein said performing step performs ASR separately for the utterances of the first user and the utterances of the second user to obtain the ASR result of the first user and the ASR result of the second user, and merges the ASR result of the first user with the ASR result of the second user in a time-aligned manner. 9. The method of claim 1 , wherein said determining step determines whether the ASR results of the first and second users are consistent or inconsistent by comparing an average of probabilities of the word from the second user across each of the word sequences against a threshold. 10. The method of claim 9 , wherein the at least one word from the first user consists of two words, and the probabilities are based on tri-grams formed from the two words from the first user and the word from the second user that follows. 11. The method of claim 9 , wherein the average of the probabilities is used as a consistency index. 12. The method of claim 1 , wherein determining whether the ASR result of the second user is consistent with the ASR result of the first user includes extracting, from the ASR for each of the utterances in the conversation between the first user and the second user, a word sequence that has a word from the first user followed by a word from the second user and calculating a probability of the word from the second user as a consistency index using the language model. 13. The method of claim 12 , further comprising: listing a plurality of word sequences from a conversation between the first user and the second user; and calculating an average probability of a last word from the second user in each of the plurality of word sequences as a consistency index using the language model. 14. The method of claim 1 , wherein the method is performed by an automatic speech recognition system. 15. The method of claim 1 , further comprising retraining the language model with any of the transcriptions from the second user being removed for the retraining to improve an accuracy of the language model, responsive to the ASR result of the second user being estimated as poor. 16. A non-transitory computer readable storage medium comprising a computer readable program for estimating automatic speech recognition (ASR) accuracy, wherein the computer readable program when executed on a computer causes the computer to perform the steps of: obtaining transcriptions of utterances in a conversation over two channels; sorting the transcriptions along a time axis using a forced alignment; training a language model with the sorted transcriptions; performing ASR for utterances in a conversation between a first user and a second user, the second user being a target of ASR accuracy estimation; determining whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model; listing word sequences that have at least one word from the first user followed by a word from the second user; and estimating the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user. 17. The non-transitory computer readable storage medium of claim 16 , wherein said sorting step adds a time index to each of the words in each of the transcriptions, and wherein, for a given pair of participants in the conversation over the two channels, said sorting step merges the transcriptions for the given pair from the two channels by sorting the words in each of the transcriptions corresponding to the given pair according to the time index. 18. The non-transitory computer readable storage medium of claim 16 , wherein said determining step determines whether the ASR results of the first and second users are consistent or inconsistent by comparing an average of probabilities of the word from the second user across each of the word sequences against a threshold. 19. A system for estimating automatic speech recognition (ASR) accuracy, the system comprising: a processor, configured to: obtain transcriptions of utterances in a conversation over two channels; sort the transcriptions along a time axis using a forced alignment; train a language model with the sorted transcriptions; perform ASR for utterances in a conversation between a first user and a second user, the second user being a target of ASR accuracy estimation; determine whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model; list word sequences that have at least one word from the first user followed by a word from the second user; and estimate the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user. 20. The system of claim 19 , wherein said processor step determines whether the ASR results of the first and second users are consistent or inconsistent by comparing an average of probabilities of the word from the second user across each of the word sequences against a threshold.

Assignees

Inventors

Classifications

  • G10L15/01Primary

    Assessment or evaluation of speech recognition systems · CPC title

  • G10L15/063Primary

    Training · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10089978B2 cover?
Methods and a system are provided for estimating automatic speech recognition (ASR) accuracy. A method includes obtaining transcriptions of utterances in a conversation over two channels. The method further includes sorting the transcriptions along a time axis using a forced alignment. The method also includes training a language model with the sorted transcriptions. The method additionally inc…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G10L15/01. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).