Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G10L15/1807. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Prosodic and lexical addressee detection

US10529321B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10529321-B2
Application number	US-201715670704-A
Country	US
Kind code	B2
Filing date	Aug 7, 2017
Priority date	Jan 31, 2013
Publication date	Jan 7, 2020
Grant date	Jan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period of time to determine a shape associated with the utterance. A score may be determined to assist in classifying the current utterance as human directed or computer directed without relying on knowledge of preceding utterances or utterances following the current utterance. Outside data may be used for training lexical addressee detection systems for the H-H-C scenario. H-C training data can be obtained from a single-user H-C collection and that H-H speech can be modeled using general conversational speech. H-C and H-H language models may also be adapted using interpolation with small amounts of matched H-H-C data.

First claim

Opening claim text (preview).

What is claimed is: 1. A conversational understanding system comprising: a processor; and memory storing computer-executable instructions that, when executed, causes the processor to: receive an utterance from a user; generate a detection score for the utterance based on processing results from a plurality of language models trained using training data other than the received utterance, the processing results comprising: a human model processing result for the utterance from a language model trained for human-directed utterances; and a computer model processing result for the utterance from a language model trained for computer-directed utterances; determine an intended addressee of the received utterance based on the generated detection score, wherein the intended addressee is one of a human and a computer; in response to determining that the intended addressee is the computer, generate a response for the received utterance; and output the response to the user. 2. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to use language model interpolation to generate the detection score based on a weighting of the human model processing result and the computer model processing result. 3. The system of claim 2 , wherein the weighting comprises weightings for each of: an in-domain part of the language model trained for human-directed utterances; an out-of-domain part of the language model trained for human-directed utterances; an in-domain part of the language model trained for computer-directed utterances; and an out-of-domain part of the language model trained for computer-directed utterances. 4. The system of claim 2 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to maximize at least one of a model perplexity and a classification accuracy to determine the weighting. 5. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to use a combination of in-domain training data and out-of-domain training data to train the language model for human-directed utterances. 6. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to use a combination of in-domain training data and out-of-domain training data to train the language model for computer-directed utterances. 7. The system of claim 1 , wherein the memory stores computer-executable instructions that, when executed, causes the processor to evaluate the generated detection score based on a threshold when determining the intended addressee of the received utterance. 8. A computer-implemented method for addressee detection, the method comprising: receiving an utterance from a user; generating a detection score for the utterance based on a plurality of language models comprising a language model trained for human-directed utterances and a language model trained for computer-directed utterances, wherein each language model of the plurality of language models is trained using a set of training data, the set of training data comprising data other than the received utterance; determining an intended addressee of the received utterance based on the generated detection score, wherein the intended addressee is one of a human and a computer; in response to determining that the intended addressee is the computer, generating a response for the received utterance; and outputting the response to the user. 9. The computer-implemented method of claim 8 , further comprising generating the detection score based on: a human model processing result for the utterance from a language model trained for human-directed utterances; and a computer model processing result for the utterance from a language model trained for computer-directed utterances. 10. The computer-implemented method of claim 9 , further comprising using language model interpolation to generate the detection score based on a weighting of the human model processing result and the computer model processing result. 11. The computer-implemented method of claim 8 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for human-directed utterances. 12. The computer-implemented method of claim 8 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for computer-directed utterances. 13. The computer-implemented method of claim 8 , further comprising evaluating the generated detection score based on a threshold to determine the intended addressee of the received utterance. 14. A computer-implemented method for addressee detection, the method comprising: receiving an utterance from a user; generating a detection score for the utterance based on processing results from a plurality of language models trained using training data other than the received utterance, the processing results comprising: a human model processing result for the utterance from a language model trained for human-directed utterances; and a computer model processing result for the utterance from a language model trained for computer-directed utterances; determining an intended addressee of the received utterance based on the generated detection score, wherein the intended addressee is one of a human and a computer; in response to determining that the intended addressee is the computer, generating a response for the received utterance; and outputting the response to the user. 15. The computer-implemented method of claim 14 , further comprising using language model interpolation to generate the detection score based on a weighting of the human model processing result and the computer model processing result. 16. The computer-implemented method of claim 15 , wherein the weighting comprises weightings for each of: an in-domain part of the language model trained for human-directed utterances; an out-of-domain part of the language model trained for human-directed utterances; an in-domain part of the language model trained for computer-directed utterances; and an out-of-domain part of the language model trained for computer-directed utterances. 17. The computer-implemented method of claim 15 , further comprising maximizing at least one of model perplexity and classification accuracy to determine the weighting. 18. The computer-implemented method of claim 14 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for human-directed utterances. 19. The computer-implemented method of claim 14 , further comprising using a combination of in-domain training data and out-of-domain training data to train the language model for computer-directed utterances. 20. The computer-implemented method of claim 14 , further comprising evaluating the generated detection score based on a threshold to determine the intended addressee of the received utterance.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G10L15/1807Primary
using prosody or stress · CPC title
G10L25/51
for comparison or discrimination · CPC title
G10L25/90
Pitch determination of speech signals · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
G10L25/18
the extracted parameters being spectral information of each sub-band · CPC title

Patent family

Related publications grouped by family.

View patent family 51223888

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10529321B2 cover?: Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period …
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/1807. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).