Efficient dialogue policy learning

US10204097B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10204097-B2
Application numberUS-201715619314-A
CountryUS
Kind codeB2
Filing dateJun 9, 2017
Priority dateAug 16, 2016
Publication dateFeb 12, 2019
Grant dateFeb 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Efficient exploration of natural language conversations associated with dialogue policy learning may be performed using probabilistic distributions. Exploration may comprise identifying key terms associated with the received natural language input utilizing the structured representation. Identifying key terms may include converting raw text of the received natural language input into a structured representation. Exploration may also comprise mapping at least one of the key terms to an action to be performed by the computer system in response to receiving natural language input associated with the at least one key term. Mapping may then be performed using a probabilistic distribution. The action may then be performed by the computer system. A replay buffer may also be utilized by the computer system to track what has occurred in previous conversations. The replay buffer may then be pre-filled with one or more successful dialogues to jumpstart exploration.

First claim

Opening claim text (preview).

What is claimed: 1. A computer system comprising: one or more processors; and one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to perform efficient exploration of natural language conversations associated with dialogue policy learning of the computer system, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following: in response to receiving natural language input, perform at least the following: identifying key terms associated with the received natural language input, wherein identifying the key terms includes converting raw text of the received natural language input into a structured representation; performing exploration of a natural language conversation associated with the received natural language input, the exploration comprising at least the following: based on the received natural language input, determining a plurality of potential actions that are to be performed by the computer system in response to the received natural language input by performing Thompson sampling using Monte Carlo samples that are associated with the received natural language input; mapping at least one of the key terms to an action selected from among the plurality of potential actions to be performed by the computer system in response to receiving the natural language input associated with the at least one key term, wherein the mapping is performed using a probabilistic distribution; and performing the action. 2. The computer system of claim 1 , wherein exploration is performed by Thompson sampling using Monte Carlo samples from a Bayes-by-Back Propagation Q Network (BBQN). 3. The computer system of claim 1 , wherein key terms comprise at least one of an act or a key=value pair. 4. The computer system of claim 1 , wherein the probabilistic distribution is dynamically learned, such that identified key terms of received natural language input are more accurately mapped to actions to be performed by the system. 5. The computer system of claim 4 , wherein the probabilistic distribution is dynamically learned using periodically created target networks. 6. The computer system of claim 1 , wherein exploration is performed in an offline environment, such that natural language input is received from a simulated user. 7. The computer system of claim 1 , wherein exploration is performed in an online environment, such that natural language input is received from an end user. 8. The computer system of claim 1 , wherein a replay buffer is utilized by the computer system to track what has occurred in previous conversations. 9. The computer system of claim 8 , wherein replay buffer spiking that comprises pre-filling the replay buffer with one or more successful dialogues is performed. 10. A method, implemented at a computer system that includes one or more processors, for performing efficient exploration of natural language conversations associated with dialogue policy learning, the method comprising: in response to receiving natural language input, performing at least the following: identifying key terms associated with the received natural language input, wherein identifying the key terms includes converting raw text of the received natural language input into a structured representation; performing exploration of a natural language conversation associated with the received natural language input, the exploration being performed using Thompson sampling from a Bayes-by-Back Propagation Q Network (BBQN), the exploration comprising at least the following: mapping at least one of the key terms to an action to be performed by the computer system in response to receiving the natural language input associated with the at least one key term, wherein the mapping is performed using a probabilistic distribution; and performing the action. 11. The method of claim 10 , wherein the exploration is performed by Thompson sampling using Monte Carlo samples from the BBQN. 12. The method of claim 10 , wherein key terms comprise at least one of an act or a key=value pair. 13. The method of claim 10 , wherein the probabilistic distribution is dynamically learned, such that identified key terms of received natural language input are more accurately mapped to actions to be performed by the system. 14. The method of claim 13 , wherein the probabilistic distribution is dynamically learned using periodically created target networks. 15. The method of claim 10 , wherein exploration is performed in an offline environment, such that natural language input is received from a simulated user. 16. The method of claim 10 , wherein exploration is performed in an online environment, such that natural language input is received from an end user. 17. The method of claim 10 , wherein a replay buffer is utilized by the computer system to track what has occurred in previous conversations. 18. The method of claim 17 , wherein replay buffer spiking that comprises pre-filling the replay buffer with one or more successful dialogues is performed. 19. A computer system comprising: one or more processors; and one or more hardware storage devices having stored thereon computer-executable instructions that are executable by the one or more processors to perform efficient exploration of natural language conversations associated with dialogue policy learning, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following: in response to receiving natural language input, perform at least the following: identifying key terms associated with the received natural language input, wherein identifying the key terms includes converting raw text of the received natural language input into a structured representation; performing exploration of a natural language conversation associated with the received natural language input, wherein the exploration is performed by Thompson sampling using Monte Carlo samples from a Bayes-by-back Propagation Q Network (BBQN), the exploration comprising at least the following: exploration comprising at least the following: mapping at least one of the key terms to an action to be performed by the computer system in response to receiving natural language input associated with the at least one key term, wherein mapping is performed using a probabilistic distribution; and performing the action.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • Computing arrangements based on biological models · CPC title

  • G06N3/006Primary

    based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Discourse or dialogue representation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10204097B2 cover?
Efficient exploration of natural language conversations associated with dialogue policy learning may be performed using probabilistic distributions. Exploration may comprise identifying key terms associated with the received natural language input utilizing the structured representation. Identifying key terms may include converting raw text of the received natural language input into a structur…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/006. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).