System and method for advanced turn-taking for interactive spoken dialog systems

US9378738B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9378738-B2
Application numberUS-201414565516-A
CountryUS
Kind codeB2
Filing dateDec 10, 2014
Priority dateSep 1, 2011
Publication dateJun 28, 2016
Grant dateJun 28, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated.

First claim

Opening claim text (preview).

We claim: 1. A method for determining turn order between a user and an interactive turn-taking spoken dialog system based on a result, the method comprising: receiving speech; and while continuing to receive the speech: identifying a starting point associated with the speech; identifying content of the speech received so far, to yield identified content; predicting a stability of the identified content; and identifying an end point associated with the speech, wherein the end point is a pinch node in a content lattice; and returning, via a processor, a result based on the stability between the starting point and the end point. 2. The method of claim 1 , wherein the starting point is one of a beginning of the speech and a previously marked pinch node. 3. The method of claim 1 , wherein the stability of the identified content of the identified content is determined using stability probability. 4. The method of claim 3 , wherein the stability probability is determined using a machine learning algorithm on a corpus of speech utterances. 5. The method of claim 4 , wherein the machine learning algorithm is a logistic regression. 6. The method of claim 1 , wherein the result comprises a path having a highest probability through a speech component lattice. 7. The method of claim 1 , wherein the result comprises partial speech recognition. 8. A system for determining turn order between a user and an interactive turn-taking spoken dialog system based on a result, the system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: receiving speech; and while continuing to receive the speech: identifying a starting point associated with the speech; identifying content of the speech received so far, to yield identified content; predicting a stability of the identified content; and identifying an end point associated with the speech, wherein the end point is a pinch node in a content lattice; and returning, a result based on the stability between the starting point and the end point. 9. The system of claim 8 , wherein the starting point is one of a beginning of the speech and a previously marked pinch node. 10. The system of claim 8 , wherein the stability of the identified content of the identified content is determined using stability probability. 11. The system of claim 10 , wherein the stability probability is determined using a machine learning algorithm on a corpus of speech utterances. 12. The system of claim 11 , wherein the machine learning algorithm is a logistic regression. 13. The system of claim 8 , wherein the result comprises a path having a highest probability through a speech component lattice. 14. The system of claim 8 , wherein the result comprises partial speech recognition. 15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: receiving speech; and while continuing to receive the speech: identifying a starting point associated with the speech; identifying content of the speech received so far, to yield identified content; predicting a stability of the identified content; and identifying an end point associated with the speech, wherein the end point is a pinch node in a content lattice; and returning a result based on the stability between the starting point and the end point. 16. The computer-readable storage device of claim 15 , wherein the starting point is one of a beginning of the speech and a previously marked pinch node. 17. The computer-readable storage device of claim 15 , wherein the stability of the identified content of the identified content is determined using stability probability. 18. The computer-readable storage device of claim 17 , wherein the stability probability is determined using a machine learning algorithm on a corpus of speech utterances. 19. The computer-readable storage device of claim 18 , wherein the machine learning algorithm is a logistic regression. 20. The computer-readable storage device of claim 15 , wherein the result comprises a path having a highest probability through a speech component lattice.

Assignees

Inventors

Classifications

  • G10L15/04Primary

    Segmentation; Word boundary detection · CPC title

  • G10L15/222Primary

    Barge in, i.e. overridable guidance for interrupting prompts · CPC title

  • Training · CPC title

  • Word boundary detection · CPC title

  • Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9378738B2 cover?
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which,…
Who is the assignee on this patent?
At & T Ip I Lp
What technology area does this patent fall under?
Primary CPC classification G10L15/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 28 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).