Prosodic and lexical addressee detection
US-9761247-B2 · Sep 12, 2017 · US
US10529321B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10529321-B2 |
| Application number | US-201715670704-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 7, 2017 |
| Priority date | Jan 31, 2013 |
| Publication date | Jan 7, 2020 |
| Grant date | Jan 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period of time to determine a shape associated with the utterance. A score may be determined to assist in classifying the current utterance as human directed or computer directed without relying on knowledge of preceding utterances or utterances following the current utterance. Outside data may be used for training lexical addressee detection systems for the H-H-C scenario. H-C training data can be obtained from a single-user H-C collection and that H-H speech can be modeled using general conversational speech. H-C and H-H language models may also be adapted using interpolation with small amounts of matched H-H-C data.
Opening claim text (preview).
What is claimed is: 1. A conversational understanding system comprising: a processor; and memory storing computer-executable instructions that, when executed, causes the processor to: receive an utterance from a user; generate a detection score for the utterance based on processing results from a plurality of language models trained using training data other than the received utterance, the processing results comprising: a human model processing result for the utterance from a language model trained for human-directed utterances; and a computer model processing result for the utterance from a language model trained for computer-directed utterances; determine an intended addressee of the received utterance based on the generated detection score, wherein the intended addressee is one of a human and a computer; in response to determining that the intended addressee is the computer, generate a response for the received utterance; and output the response to the user. 2. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to use language model interpolation to generate the detection score based on a weighting of the human model processing result and the computer model processing result. 3. The system of claim 2 , wherein the weighting comprises weightings for each of: an in-domain part of the language model trained for human-directed utterances; an out-of-domain part of the language model trained for human-directed utterances; an in-domain part of the language model trained for computer-directed utterances; and an out-of-domain part of the language model trained for computer-directed utterances. 4. The system of claim 2 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to maximize at least one of a model perplexity and a classification accuracy to determine the weighting. 5. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to use a combination of in-domain training data and out-of-domain training data to train the language model for human-directed utterances. 6. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to use a combination of in-domain training data and out-of-domain training data to train the language model for computer-directed utterances. 7. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to evaluate the generated detection score based on a threshold when determining the intended addressee of the received utterance. 8. A computer-implemented method for addressee detection, the method comprising: receiving an utterance from a user; generating a detection score for the utterance based on a plurality of language models comprising a language model trained for human-directed utterances and a language model trained for computer-directed utterances, wherein each language model of the plurality of language models is trained using a set of training data, the set of training data comprising data other than the received utterance; determining an intended addressee of the received utterance based on the generated detection score, wherein the intended addressee is one of a human and a computer; in response to determining that the intended addressee is the computer, generating a response for the received utterance; and outputting the response to the user. 9. The computer-implemented method of claim 8 , further comprising generating the detection score based on: a human model processing result for the utterance from a language model trained for human-directed utterances; and a computer model processing result for the utterance from a language model trained for computer-directed utterances. 10. The computer-implemented method of claim 9 , further comprising using language model interpolation to generate the detection score based on a weighting of the human model processing result and the computer model processing result. 11. The computer-implemented method of claim 8 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for human-directed utterances. 12. The computer-implemented method of claim 8 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for computer-directed utterances. 13. The computer-implemented method of claim 8 , further comprising evaluating the generated detection score based on a threshold to determine the intended addressee of the received utterance. 14. A computer-implemented method for addressee detection, the method comprising: receiving an utterance from a user; generating a detection score for the utterance based on processing results from a plurality of language models trained using training data other than the received utterance, the processing results comprising: a human model processing result for the utterance from a language model trained for human-directed utterances; and a computer model processing result for the utterance from a language model trained for computer-directed utterances; determining an intended addressee of the received utterance based on the generated detection score, wherein the intended addressee is one of a human and a computer; in response to determining that the intended addressee is the computer, generating a response for the received utterance; and outputting the response to the user. 15. The computer-implemented method of claim 14 , further comprising using language model interpolation to generate the detection score based on a weighting of the human model processing result and the computer model processing result. 16. The computer-implemented method of claim 15 , wherein the weighting comprises weightings for each of: an in-domain part of the language model trained for human-directed utterances; an out-of-domain part of the language model trained for human-directed utterances; an in-domain part of the language model trained for computer-directed utterances; and an out-of-domain part of the language model trained for computer-directed utterances. 17. The computer-implemented method of claim 15 , further comprising maximizing at least one of model perplexity and classification accuracy to determine the weighting. 18. The computer-implemented method of claim 14 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for human-directed utterances. 19. The computer-implemented method of claim 14 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for computer-directed utterances. 20. The computer-implemented method of claim 14 , further comprising evaluating the generated detection score based on a threshold to determine the intended addressee of the received utterance.
using prosody or stress · CPC title
for comparison or discrimination · CPC title
Pitch determination of speech signals · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
the extracted parameters being spectral information of each sub-band · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.