Model based approach for on-screen item selection and disambiguation
US-9412363-B2 · Aug 9, 2016 · US
US9953644B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9953644-B2 |
| Application number | US-201414557030-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 1, 2014 |
| Priority date | Dec 1, 2014 |
| Publication date | Apr 24, 2018 |
| Grant date | Apr 24, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system, method and computer-readable storage devices are disclosed for using targeted clarification (TC) questions in dialog systems in a multimodal virtual agent system (MVA) providing access to information about movies, restaurants, and musical events. In contrast with open-domain spoken systems, the MVA application covers a domain with a fixed set of concepts and uses a natural language understanding (NLU) component to mark concepts in automatically recognized speech. Instead of identifying an error segment, localized error detection (LED) identifies which of the concepts are likely to be present and correct using domain knowledge, automatic speech recognition (ASR), and NLU tags and scores. If at least concept is identified to be present but not correct, the TC component uses this information to generate a targeted clarification question. This approach computes probability distributions of concept presence and correctness for each user utterance, which can apply to automatic learning for clarification policies.
Opening claim text (preview).
We claim: 1. A method comprising: processing, via a speech recognizer, an utterance from a speaker to produce speech recognition output; identifying speech segments in the speech recognition output; generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog; generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with the utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and presenting the targeted clarification question to the speaker in response to the utterance. 2. The method of claim 1 , wherein the context that is unavailable to the speech recognizer comprises one of dialog history, a concept co-occurrence probability, domain history, speech recognition confidence scores, contextual features of the utterance, and tagging scores. 3. The method of claim 1 , wherein the concept presence score indicates a confidence that a concept type is present in a respective speech segment, and wherein the concept correctness score indicates a confidence that an identification of the concept type is correct. 4. The method of claim 1 , wherein the targeted clarification question is generated based on a question template associated with the speech segments. 5. The method of claim 1 , further comprising: identifying multiple speech segments below a certainty threshold; and generating the targeted clarification question based on respective concept presence scores and concept correctness scores for the multiple speech segments. 6. The method of claim 1 , wherein the concept presence score and the concept correctness score are generated based on a domain of available concepts. 7. The method of claim 1 , wherein the speech recognizer identifies that at least one of the speech segments has a concept presence score above a certainty threshold. 8. A system comprising: a processor; a speech recognizer; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog; generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with an utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and presenting the targeted clarification question to a speaker in response to the utterance. 9. The system of claim 8 , wherein the context that is unavailable to the speech recognizer comprises one of dialog history, a concept co-occurrence probability, a domain history, speech recognition confidence scores, contextual features of the utterance, and tagging scores. 10. The system of claim 8 , wherein the concept presence score indicates a confidence that a concept type is present in a respective speech segment, and wherein the concept correctness score indicates a confidence that an identification of the concept type is correct. 11. The system of claim 8 , wherein the targeted clarification question is generated based on a question template associated with the speech segments. 12. The system of claim 8 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising: identifying multiple speech segments below a certainty threshold; and generating the targeted clarification question based on respective concept presence scores and concept correctness scores for the multiple speech segments. 13. The system of claim 8 , wherein the concept presence score and the concept correctness score are generated based on a domain of available concepts. 14. The system of claim 8 , wherein the speech recognizer identifies that at least one of the speech segments has a concept presence score above a certainty threshold. 15. A non-transitory computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog; generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with an utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and presenting the targeted clarification question to a speaker in response to the utterance. 16. The non-transitory computer-readable storage device of claim 15 , wherein the context that is unavailable to the speech recognizer comprises one of dialog history, a concept co-occurrence probability, domain history, speech recognition confidence scores, contextual features of the utterance, and tagging scores. 17. The non-transitory computer-readable storage device of claim 15 , wherein the concept presence score indicates a confidence that a concept type is present in a respective speech segment, and wherein the concept correctness score indicates a confidence that an identification of the concept type is correct. 18. The non-transitory computer-readable storage device of claim 15 , wherein the targeted clarification question is generated based on a question template associated with the speech segments. 19. The non-transitory computer-readable storage device of claim 15 , having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising: identifying multiple speech segments below a certainty threshold; and generating the targeted clarification question based on respective concept presence scores and concept correctness scores for the multiple speech segments. 20. The non-transitory computer-readable storage device of claim 15 , wherein the concept presence score and the concept correctness score are generated based on a domain of available concepts.
with voice recognition means · CPC title
Assessment or evaluation of speech recognition systems · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Parsing for meaning understanding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.