Joint language understanding and dialogue management using binary classification based on forward and backward recurrent neural network

US10268679B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10268679-B2
Application numberUS-201615368380-A
CountryUS
Kind codeB2
Filing dateDec 2, 2016
Priority dateDec 2, 2016
Publication dateApr 23, 2019
Grant dateApr 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing unit can operate an end-to-end recurrent neural network (RNN) with limited contextual dialog memory that can be jointly trained by supervised signals-user slot tagging, intent prediction and/or system action prediction. The end-to-end RNN, or joint model has shown advantages over separate models for natural language understanding (NLU) and dialog management and can capture expressive feature representations beyond conventional aggregation of slot tags and intents, to mitigate effects of noisy output from NLU. The joint model can apply a supervised signal from system actions to refine the NLU model. By back-propagating errors associated with system action prediction to the NLU model, the joint model can use machine learning to predict user intent by a binary classification obtained by both forward and backward output, and perform slot tagging, and make system action predictions based on user input, e.g., utterances across a number of domains.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: one or more processing unit(s); one or more computer-readable media coupled to one or more of the processing unit(s), the one or more computer-readable media having thereon one or more modules of computer-executable instructions to configure a computer to perform operations of an end-to-end recurrent neural network (RNN) model, the operations comprising: sharing output of bi-directional long short-term memory including slot tags and intent predictions from a natural language understanding (NLU) component of the end-to-end RNN model with a dialogue management (DM) component of the end-to-end RNN model; receiving as inputs at the DM component hidden outputs including slot tags and intent predictions from the NLU component; receiving contextual history at the NLU component from the DM component; refining the NLU component based at least on the contextual history; and generating a system action prediction at an output layer of the end-to-end RNN model utilizing at least one one-to-many binary classification based on the inputs, wherein the binary classification is obtained using an activation function operating on a combination of forward hidden output as well as backward output. 2. A system as claim 1 recites, wherein the end-to-end RNN model includes a plurality of bi-directional long short-term memory (LSTM) cells. 3. A system as claim 2 recites, wherein the bi-directional LSTM cells are configured to combine forward hidden output and backward output from an input vector and a reverse sequence of the input vector. 4. A system as claim 3 recites, the one or more computer-readable media having thereon one or more modules of computer-executable instructions to configure the computer to perform operations further comprising incorporating bi-directional weight matrices when combining the forward hidden output and the backward output from the input vector and the reverse sequence of the input vector. 5. A system as claim 2 recites, wherein a bi-directional LSTM cell of the bi-directional LSTM cells is configured to compute a sequence of hidden vectors and output a sequence of vectors according to calculations including a softmax for a vector representation of an input. 6. A system as claim 1 recites, the one or more computer-readable media having thereon one or more modules of computer-executable instructions to configure the computer to perform operations further comprising jointly training the end-to-end RNN model with a supervised signal of system action prediction. 7. A system as claim 1 recites, wherein the contextual history includes errors propagated from the DM component to the NLU component. 8. A system as claim 1 recites, the one or more computer-readable media having thereon one or more modules of computer-executable instructions to configure the computer to perform operations further comprising applying an element-wise sigmoid function. 9. A method comprising: jointly training on multi-domain human-human dialogues: a natural language understanding (NLU) layer; and a dialogue manager (DM) layer; and jointly modeling NLU and dialogue management in an end-to-end recurrent neural network (RNN) based at least on output of the NLU layer and output of the DM layer serving as input to the other of the NLU layer and the DM layer, wherein jointly modeling includes: receiving as inputs at the DM layer, a hidden output from the NLU layer including slot tags and intent predictions, and generating a system action prediction at an output layer of the end-to-end RNN utilizing at least one one-to-many binary classification based on the inputs, wherein the binary classification is obtained using an activation function operating on a combination of forward hidden output as well as backward output. 10. A method as claim 9 recites, wherein the NLU component receives as input a sequence of word vectors, and the NLU layer at least one of: estimates conditional probability to minimize distance between possible outputs for slot tagging; or performs classification for intent prediction. 11. A method as claim 9 recites, further comprising activating neurons of binary classifiers in an output layer of the end-to-end RNN using a sigmoid function. 12. A method as claim 9 recites, further comprising jointly training the end-to-end RNN model with a supervised signal of system action prediction. 13. A method as claim 9 recites, wherein the NLU layer is configured to receive utterances as input. 14. A method as claim 9 recites, further comprising combining forward hidden output and backward output from an input vector and a reverse sequence of the input vector. 15. A system comprising: one or more processing unit(s); one or more computer-readable media coupled to one or more of the processing unit(s), the one or more computer-readable media including: an end-to end recurrent neural network (RNN) architecture operating as an aggregated model with limited contextual dialogue memory, the aggregated model limiting contextual dialogue memory by aggregating: a natural language understanding (NLU) part; and a dialogue management (DM) part, wherein the DM part receives as inputs, hidden outputs from the NLU part, and a system action prediction at an output layer of the end-to-end RNN model utilizes at least one one-to-many binary classification based on the inputs, wherein the binary classification is obtained using an activation function operating on a combination of forward hidden output as well as backward output. 16. A system as claim 15 recites, further comprising a training module to train the aggregated model with a supervised signal of system action prediction. 17. A system as claim 15 recites, wherein the end-to end RNN architecture includes a plurality of bi-directional long short-term memory (LSTM) cells configured to compute a sequence of hidden vectors and output a sequence of vectors according to calculations including a softmax for a vector representation of a current input. 18. A system as claim 15 recites, wherein the end-to end RNN architecture includes a plurality of bi-directional long short-term memory (LSTM) cells configured to combine forward hidden output and backward output from an input vector and a reverse sequence of the input vector.

Assignees

Inventors

Classifications

  • G06F40/35Primary

    Discourse or dialogue representation · CPC title

  • using neural networks · CPC title

  • G10L15/063Primary

    Training · CPC title

  • using natural language modelling · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10268679B2 cover?
A processing unit can operate an end-to-end recurrent neural network (RNN) with limited contextual dialog memory that can be jointly trained by supervised signals-user slot tagging, intent prediction and/or system action prediction. The end-to-end RNN, or joint model has shown advantages over separate models for natural language understanding (NLU) and dialog management and can capture expressi…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).