Prosodic and lexical addressee detection

US10529321B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10529321-B2
Application numberUS-201715670704-A
CountryUS
Kind codeB2
Filing dateAug 7, 2017
Priority dateJan 31, 2013
Publication dateJan 7, 2020
Grant dateJan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period of time to determine a shape associated with the utterance. A score may be determined to assist in classifying the current utterance as human directed or computer directed without relying on knowledge of preceding utterances or utterances following the current utterance. Outside data may be used for training lexical addressee detection systems for the H-H-C scenario. H-C training data can be obtained from a single-user H-C collection and that H-H speech can be modeled using general conversational speech. H-C and H-H language models may also be adapted using interpolation with small amounts of matched H-H-C data.

First claim

Opening claim text (preview).

What is claimed is: 1. A conversational understanding system comprising: a processor; and memory storing computer-executable instructions that, when executed, causes the processor to: receive an utterance from a user; generate a detection score for the utterance based on processing results from a plurality of language models trained using training data other than the received utterance, the processing results comprising: a human model processing result for the utterance from a language model trained for human-directed utterances; and a computer model processing result for the utterance from a language model trained for computer-directed utterances; determine an intended addressee of the received utterance based on the generated detection score, wherein the intended addressee is one of a human and a computer; in response to determining that the intended addressee is the computer, generate a response for the received utterance; and output the response to the user. 2. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to use language model interpolation to generate the detection score based on a weighting of the human model processing result and the computer model processing result. 3. The system of claim 2 , wherein the weighting comprises weightings for each of: an in-domain part of the language model trained for human-directed utterances; an out-of-domain part of the language model trained for human-directed utterances; an in-domain part of the language model trained for computer-directed utterances; and an out-of-domain part of the language model trained for computer-directed utterances. 4. The system of claim 2 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to maximize at least one of a model perplexity and a classification accuracy to determine the weighting. 5. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to use a combination of in-domain training data and out-of-domain training data to train the language model for human-directed utterances. 6. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to use a combination of in-domain training data and out-of-domain training data to train the language model for computer-directed utterances. 7. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to evaluate the generated detection score based on a threshold when determining the intended addressee of the received utterance. 8. A computer-implemented method for addressee detection, the method comprising: receiving an utterance from a user; generating a detection score for the utterance based on a plurality of language models comprising a language model trained for human-directed utterances and a language model trained for computer-directed utterances, wherein each language model of the plurality of language models is trained using a set of training data, the set of training data comprising data other than the received utterance; determining an intended addressee of the received utterance based on the generated detection score, wherein the intended addressee is one of a human and a computer; in response to determining that the intended addressee is the computer, generating a response for the received utterance; and outputting the response to the user. 9. The computer-implemented method of claim 8 , further comprising generating the detection score based on: a human model processing result for the utterance from a language model trained for human-directed utterances; and a computer model processing result for the utterance from a language model trained for computer-directed utterances. 10. The computer-implemented method of claim 9 , further comprising using language model interpolation to generate the detection score based on a weighting of the human model processing result and the computer model processing result. 11. The computer-implemented method of claim 8 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for human-directed utterances. 12. The computer-implemented method of claim 8 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for computer-directed utterances. 13. The computer-implemented method of claim 8 , further comprising evaluating the generated detection score based on a threshold to determine the intended addressee of the received utterance. 14. A computer-implemented method for addressee detection, the method comprising: receiving an utterance from a user; generating a detection score for the utterance based on processing results from a plurality of language models trained using training data other than the received utterance, the processing results comprising: a human model processing result for the utterance from a language model trained for human-directed utterances; and a computer model processing result for the utterance from a language model trained for computer-directed utterances; determining an intended addressee of the received utterance based on the generated detection score, wherein the intended addressee is one of a human and a computer; in response to determining that the intended addressee is the computer, generating a response for the received utterance; and outputting the response to the user. 15. The computer-implemented method of claim 14 , further comprising using language model interpolation to generate the detection score based on a weighting of the human model processing result and the computer model processing result. 16. The computer-implemented method of claim 15 , wherein the weighting comprises weightings for each of: an in-domain part of the language model trained for human-directed utterances; an out-of-domain part of the language model trained for human-directed utterances; an in-domain part of the language model trained for computer-directed utterances; and an out-of-domain part of the language model trained for computer-directed utterances. 17. The computer-implemented method of claim 15 , further comprising maximizing at least one of model perplexity and classification accuracy to determine the weighting. 18. The computer-implemented method of claim 14 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for human-directed utterances. 19. The computer-implemented method of claim 14 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for computer-directed utterances. 20. The computer-implemented method of claim 14 , further comprising evaluating the generated detection score based on a threshold to determine the intended addressee of the received utterance.

Assignees

Inventors

Classifications

  • using prosody or stress · CPC title

  • for comparison or discrimination · CPC title

  • Pitch determination of speech signals · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • the extracted parameters being spectral information of each sub-band · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10529321B2 cover?
Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/1807. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).