Hierarchical attention for spoken dialogue state tracking

US11017767B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11017767-B2
Application numberUS-201715473409-A
CountryUS
Kind codeB2
Filing dateMar 29, 2017
Priority dateMar 29, 2016
Publication dateMay 25, 2021
Grant dateMay 25, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are systems and methods for providing hierarchical state tracking in a spoken dialogue system. A sequence of turns is received by a spoken dialogue system. Each turn includes a user utterance and a machine act. At each turn, a value pointer and a turn pointer are provided for that turn. The value pointer represents a probability distribution over the one or more words in the user utterance that indicates whether each word in the user utterance is a slot value for a slot. The turn pointer identifies which turn in a set of turns includes a currently-relevant slot value for the slot, where the set of turns includes a current turn for which the turn point is being provided, and all turns that precede the current turn.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of state tracking in a spoken dialogue system, the method comprising: receiving a sequence of turns, each turn comprising: a numerical identifier; a user utterance comprising one or more words received at the spoken dialogue system; and a machine act comprising one or more words produced by the spoken dialogue system; providing, by the spoken dialogue system using a hierarchical pointer network that relates a slot value in one turn to a slot value in another turn, a first value pointer, wherein the first value pointer indicates a first slot value for a first slot based on a first user utterance in a first turn of the received sequence of turns; providing, by the spoken dialogue system, a first turn pointer for the first turn of the received sequence of turns, wherein the first turn is a current turn and includes a first numerical identifier, wherein the first turn pointer includes a second numerical identifier of a second turn in the sequence of turns, wherein the second turn is a prior turn of the sequence of turns, wherein the second turn is distinct from the first turn and includes at least one of a second user utterance or a second machine act having a second slot value for a second slot, and wherein the second slot value matches the first slot value; determining a first dialogue state for the first turn based at least on a combination of the first value pointer and the first turn pointer, wherein the first dialogue state is determined based on a predicted context output of the hierarchical pointer network; and determining a first machine act of the first turn to be performed by the spoken dialogue system based on the determined first dialogue state. 2. The computer-implemented method of claim 1 , wherein the first value pointer comprises a probability distribution over the one or more words in the first user utterance or over the one or more words in a knowledge database. 3. The computer-implemented method of claim 2 , wherein providing the first value pointer comprises producing the probability distribution over the one or more words in the first user utterance to indicate whether each word in the first user utterance is the first slot value for the first slot. 4. The computer-implemented method of claim 3 , wherein the probability distribution is a first probability distribution and the operation of providing the first value pointer further comprises: determining a second probability distribution that a user affirmed the first slot value mentioned in the first machine act of the first turn; and when the user affirmed the first slot value, causing the first value pointer to point to the word cited in the first machine act. 5. The computer-implemented method of claim 2 , wherein providing the first value pointer comprises producing the probability distribution by comparing the one or more words in the first user utterance to corresponding one or more words in the knowledge database. 6. The computer-implemented method of claim 5 , wherein the probability distribution is a first probability distribution, and wherein providing the first value pointer further comprises: for a respective turn in the sequence of turns, determining a second probability distribution that a user affirmed the first slot value mentioned in the first machine act of the first turn; and when the user affirmed the first slot value, causing the first value pointer to point to a word cited in the first machine act. 7. The computer-implemented method of claim 5 , wherein providing the first value pointer comprises processing each word in the first user utterance, and wherein the hierarchical pointer network is configured as a recurrent neural network. 8. The computer-implemented method of claim 7 , wherein the recurrent neural network comprises a bi-directional neural network. 9. The computer-implemented method of claim 2 , wherein the probability distribution is a first probability distribution, and wherein determining the first dialogue state for the first slot comprises determining, for each slot, a second probability distribution over all possible slot values for every slot. 10. The computer-implemented method of claim 9 , further comprising: determining the first machine act to be performed by the spoken dialogue system based on the second probability distribution over all possible slot values for every slot; and causing the spoken dialogue system to perform the first machine act. 11. The computer-implemented method of claim 10 , wherein the first machine act comprises: asking a confirming question; asking for more information; or sending a message. 12. A system, comprising: at least one processing unit; and at least one memory storing computer executable instructions that, when executed by the at least one processing unit, cause the system to: receive a sequence of turns, each turn comprising: a numerical identifier; a user utterance comprising one or more words received at a spoken dialogue system; and a machine act comprising one or more words produced by the spoken dialogue system; provide, by the spoken dialogue system using a hierarchical pointer network that relates a slot value in one turn to a slot value in another turn, a first value pointer, wherein the first value pointer indicates a first slot value for a first slot based on a first user utterance in a first turn of the received sequence of turns; provide, by the spoken dialogue system, a first turn pointer for the first turn of the received sequence of turns, wherein the first turn is a current turn and includes a first numerical identifier, wherein the first turn pointer includes a second numerical identifier of a second turn in the sequence of turns, wherein the second turn is a prior turn of the sequence of turns, wherein the second turn is distinct from the first turn and includes at least one of a second user utterance or a second machine act having a second slot value for a second slot, the second turn associated with a designator identifying an utterance type or a machine act type, respectively, and wherein the second slot value matches the first slot value; determine a first dialogue state for the first turn based at least on a combination of the first value pointer and the first turn pointer, wherein the first dialogue state is determined based on a predicted context output of the hierarchical pointer network; and determine a first machine act of the first turn to be performed by the spoken dialogue system based on the determined first dialogue state. 13. The system of claim 12 , further comprising instructions for accessing a knowledge database. 14. The system of claim 13 , wherein the first value pointer comprises a probability distribution over the one or more words in the first user utterance or over the one or more words in a knowledge database. 15. The system of claim 14 , wherein the instructions for providing the first value pointer comprise instructions for: producing the probability distribution by comparing the one or more words in the first user utterance to corresponding one or more words in the knowledge database; or producing the probability distribution over the one or more words in the first user utterance to indicate whether each word in the first user utterance is the first slot value for the first slot. 16. The system of claim 15 , wherein the probability distribution is a first probability distribution and the instructions for providing the first value pointer further comprises instructions for: determining a se

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Probabilistic grammars, e.g. word n-grams · CPC title

  • Execution procedure of a spoken command · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11017767B2 cover?
Described herein are systems and methods for providing hierarchical state tracking in a spoken dialogue system. A sequence of turns is received by a spoken dialogue system. Each turn includes a user utterance and a machine act. At each turn, a value pointer and a turn pointer are provided for that turn. The value pointer represents a probability distribution over the one or more words in the us…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).