Targeted clarification questions in speech recognition with concept presence score and concept correctness score

US9953644B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9953644-B2
Application numberUS-201414557030-A
CountryUS
Kind codeB2
Filing dateDec 1, 2014
Priority dateDec 1, 2014
Publication dateApr 24, 2018
Grant dateApr 24, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, method and computer-readable storage devices are disclosed for using targeted clarification (TC) questions in dialog systems in a multimodal virtual agent system (MVA) providing access to information about movies, restaurants, and musical events. In contrast with open-domain spoken systems, the MVA application covers a domain with a fixed set of concepts and uses a natural language understanding (NLU) component to mark concepts in automatically recognized speech. Instead of identifying an error segment, localized error detection (LED) identifies which of the concepts are likely to be present and correct using domain knowledge, automatic speech recognition (ASR), and NLU tags and scores. If at least concept is identified to be present but not correct, the TC component uses this information to generate a targeted clarification question. This approach computes probability distributions of concept presence and correctness for each user utterance, which can apply to automatic learning for clarification policies.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: processing, via a speech recognizer, an utterance from a speaker to produce speech recognition output; identifying speech segments in the speech recognition output; generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog; generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with the utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and presenting the targeted clarification question to the speaker in response to the utterance. 2. The method of claim 1 , wherein the context that is unavailable to the speech recognizer comprises one of dialog history, a concept co-occurrence probability, domain history, speech recognition confidence scores, contextual features of the utterance, and tagging scores. 3. The method of claim 1 , wherein the concept presence score indicates a confidence that a concept type is present in a respective speech segment, and wherein the concept correctness score indicates a confidence that an identification of the concept type is correct. 4. The method of claim 1 , wherein the targeted clarification question is generated based on a question template associated with the speech segments. 5. The method of claim 1 , further comprising: identifying multiple speech segments below a certainty threshold; and generating the targeted clarification question based on respective concept presence scores and concept correctness scores for the multiple speech segments. 6. The method of claim 1 , wherein the concept presence score and the concept correctness score are generated based on a domain of available concepts. 7. The method of claim 1 , wherein the speech recognizer identifies that at least one of the speech segments has a concept presence score above a certainty threshold. 8. A system comprising: a processor; a speech recognizer; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog; generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with an utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and presenting the targeted clarification question to a speaker in response to the utterance. 9. The system of claim 8 , wherein the context that is unavailable to the speech recognizer comprises one of dialog history, a concept co-occurrence probability, a domain history, speech recognition confidence scores, contextual features of the utterance, and tagging scores. 10. The system of claim 8 , wherein the concept presence score indicates a confidence that a concept type is present in a respective speech segment, and wherein the concept correctness score indicates a confidence that an identification of the concept type is correct. 11. The system of claim 8 , wherein the targeted clarification question is generated based on a question template associated with the speech segments. 12. The system of claim 8 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising: identifying multiple speech segments below a certainty threshold; and generating the targeted clarification question based on respective concept presence scores and concept correctness scores for the multiple speech segments. 13. The system of claim 8 , wherein the concept presence score and the concept correctness score are generated based on a domain of available concepts. 14. The system of claim 8 , wherein the speech recognizer identifies that at least one of the speech segments has a concept presence score above a certainty threshold. 15. A non-transitory computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog; generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with an utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and presenting the targeted clarification question to a speaker in response to the utterance. 16. The non-transitory computer-readable storage device of claim 15 , wherein the context that is unavailable to the speech recognizer comprises one of dialog history, a concept co-occurrence probability, domain history, speech recognition confidence scores, contextual features of the utterance, and tagging scores. 17. The non-transitory computer-readable storage device of claim 15 , wherein the concept presence score indicates a confidence that a concept type is present in a respective speech segment, and wherein the concept correctness score indicates a confidence that an identification of the concept type is correct. 18. The non-transitory computer-readable storage device of claim 15 , wherein the targeted clarification question is generated based on a question template associated with the speech segments. 19. The non-transitory computer-readable storage device of claim 15 , having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising: identifying multiple speech segments below a certainty threshold; and generating the targeted clarification question based on respective concept presence scores and concept correctness scores for the multiple speech segments. 20. The non-transitory computer-readable storage device of claim 15 , wherein the concept presence score and the concept correctness score are generated based on a domain of available concepts.

Assignees

Inventors

Classifications

  • with voice recognition means · CPC title

  • Assessment or evaluation of speech recognition systems · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9953644B2 cover?
A system, method and computer-readable storage devices are disclosed for using targeted clarification (TC) questions in dialog systems in a multimodal virtual agent system (MVA) providing access to information about movies, restaurants, and musical events. In contrast with open-domain spoken systems, the MVA application covers a domain with a fixed set of concepts and uses a natural language un…
Who is the assignee on this patent?
At & T Ip I Lp
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 24 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).