What technology area does this patent fall under?

Primary CPC classification G10L15/222. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

System and method for advanced turn-taking for interactive spoken dialog systems

US10152971B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10152971-B2
Application number	US-201615190325-A
Country	US
Kind code	B2
Filing date	Jun 23, 2016
Priority date	Sep 1, 2011
Publication date	Dec 11, 2018
Grant date	Dec 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: receiving, via an interactive turn-taking spoken dialog system speech from a user as part of a conversation with the interactive turn-taking spoken dialog system, wherein the speech comprises at least two words; performing a partial speech recognition prior to completion of the speech by: identifying a starting point associated with the speech, the starting point being at a first time; identifying content of the speech received between the first time and a second time, to yield identified content; identifying an intermediary end point associated with the speech corresponding to the second time, wherein the intermediary end point is a pinch node in a content lattice; and returning, via the interactive turn-taking spoken dialog system, a partial recognition of the identified content based on the starting point and the intermediary end point to yield partially recognized speech, wherein the partial recognition is based on a stability of the identified content between the starting point and the intermediary end point, the stability being based on a stability probability determined using a machine learning algorithm on a corpus of speech utterances; and presenting, via the interactive turn-taking spoken dialog system, the user with a response to the partially recognized speech. 2. The method of claim 1 , wherein the starting point is one of a beginning of the speech and a previously marked pinch node. 3. The method of claim 1 , training the machine learning algorithm to perform stability determinations by: performing recognition on the corpus of speech utterances to yield a respective partial result for each speech utterance in the corpus; and determining, using the machine learning algorithm, whether the respective partial result is one of stable or unstable. 4. The method of claim 3 , wherein the respective partial result comprises one or more recognition features comprising at least one of a path cost, a lattice structure, or a type of result. 5. The method of claim 4 , wherein the machine learning algorithm is a logistic regression. 6. The method of claim 1 , wherein the partial recognition is based on a path having a highest probability through a speech component lattice. 7. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: receiving speech from a user as part of a conversation with an interactive turn-taking spoken dialog system, wherein the speech comprises at least two words; performing a partial speech recognition prior to completion of the speech by: identifying a starting point associated with the speech, the starting point being at a first time; identifying content of the speech received between the first time and a second time, to yield identified content; identifying an intermediary end point associated with the speech corresponding to the second time, wherein the intermediary end point is a pinch node in a content lattice; and returning a partial recognition of the identified content based on the starting point and the intermediary end point to yield partially recognized speech, wherein the partial recognition is based on a stability of the identified content between the starting point and the intermediary end point, the stability being based on a stability probability determined using a machine learning algorithm on a corpus of speech utterances; and presenting the user with a response to the partially recognized speech. 8. The system of claim 7 , wherein the starting point is one of a beginning of the speech and a previously marked pinch node. 9. The system of claim 7 , training the machine learning algorithm to perform stability determinations by: performing recognition on the corpus of speech utterances to yield a respective partial result for each speech utterance in the corpus; and determining, using the machine learning algorithm, whether the respective partial result is one of stable or unstable. 10. The system of claim 9 , wherein the respective partial result comprises one or more recognition features comprising at least one of a path cost, a lattice structure, or a type of result. 11. The system of claim 10 , wherein the machine learning algorithm is a logistic regression. 12. The system of claim 7 , wherein the partial recognition is based on a path having a highest probability through a speech component lattice. 13. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: receiving speech from a user as part of a conversation with an interactive turn-taking spoken dialog system, wherein the speech comprises at least two words; performing a partial speech recognition prior to completion of the speech by: identifying a starting point associated with the speech, the starting point being at a first time; identifying content of the speech received between the first time and a second time, to yield identified content; identifying an intermediary end point associated with the speech corresponding to the second time, wherein the intermediary end point is a pinch node in a content lattice; and returning a partial recognition of the identified content based on the starting point and the intermediary end point to yield partially recognized speech, wherein the partial recognition is based on a stability of the identified content between the starting point and the intermediary end point, the stability being based on a stability probability determined using a machine learning algorithm on a corpus of speech utterances; and presenting the user with a response to the partially recognized speech. 14. The computer-readable storage device of claim 13 , wherein the starting point is one of a beginning of the speech and a previously marked pinch node. 15. The computer-readable storage device of claim 13 , training the machine learning algorithm to perform stability determinations by: performing recognition on the corpus of speech utterances to yield a respective partial result for each speech utterance in the corpus; and determining, using the machine learning algorithm, whether the respective partial result is one of stable or unstable. 16. The computer-readable storage device of claim 15 , wherein the respective partial result comprises one or more recognition features comprising at least one of a path cost, a lattice structure, or a type of result. 17. The computer-readable storage device of claim 16 , wherein the machine learning algorithm is a logistic regression. 18. The computer-readable storage device of claim 13 , wherein the partial recognition is based on a path having a highest probability through a speech component lattice.

Assignees

Nuance Communications Inc

Inventors

Classifications

G10L15/222Primary
Barge in, i.e. overridable guidance for interrupting prompts · CPC title
G10L15/083
Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title
G10L15/04Primary
Segmentation; Word boundary detection · CPC title
G10L15/063
Training · CPC title
G10L15/05
Word boundary detection · CPC title

Patent family

Related publications grouped by family.

View patent family 47753830

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10152971B2 cover?: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which,…
Who is the assignee on this patent?: Nuance Communications Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/222. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).