Learning of policy for selection of associative topic in dialog system

US11574550B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11574550-B2
Application numberUS-201715800465-A
CountryUS
Kind codeB2
Filing dateNov 1, 2017
Priority dateMar 6, 2017
Publication dateFeb 7, 2023
Grant dateFeb 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for learning a policy for selection of an associative topic, which can be used in a dialog system, is described. The method includes obtaining a policy base that indicates a topic transition from a source topic to a destination topic and a short-term reward for the topic transition, by analyzing data from a corpus. The short-term reward may be defined as probability of associating a positive response. The method also includes calculating an expected long-term reward for the topic transition using the short-term reward for the topic transition with taking into account a discounted reward for a subsequent topic transition. The method further includes generating a policy using the policy base and the expected long-term reward for the topic transition. The policy indicates selection of the destination topic for the source topic as an associative topic for a current topic.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for selection of an associative topic in a conversation, the method comprising: learning a policy for the selection of an associative topic using a machine learning process, the machine learning process including online learning and offline learning, comprising: in response to receiving a request by at least one user, analyzing data from a corpus stored on a memory associated with a hardware processor to obtain a policy base indicating a topic transition in the conversation from a source topic to a destination topic and a short-term reward for the topic transition, the short-term reward being defined as a determined probability of appearances of one or more types of positive expressions associated with the topic transition in the conversation based on an appearance count of any of the one or more types of positive expressions having a dependency to the destination topic in the conversation; calculating, using the hardware processor, an expected long-term reward for the topic transition in the conversation using the short-term reward for the topic transition with taking into account a discounted reward for a subsequent topic transition in the conversation as follows: Q ⁡ ( t , t ′ ) = R ⁡ ( t , t ′ ) + γ ⁢ max t ′′ ⁢ Q ⁡ ( t ′ , t ″ ) , ( 2 ) , where γ denotes a discount factor (γ<1) for evaluating a discounted value of the expected long-term reward for the subsequent topic transition, where Q represents the expected long-term reward function, R represents the immediate reward function, Q(t′, t″) represents a discounted life-long reward from a selection of a subsequent topic feature, t represents an initial topic feature, t′ represents a destination topic feature, and t″ represents the subsequent topic feature: generating a policy using the policy base and the expected long-term reward for the topic transition in the conversation; and implementing a dialogue between a remote computing device with at least one user based on the policy using an interface of an associated device to obtain a user-provided reward during the conversation, the online learning and offline learning being implemented on separate computing systems including the remote developer computing device and the associated device, respectively. 2. The method of claim 1 , wherein the calculating comprises: evaluating a maximum long-term reward received from available subsequent topic transitions to calculate the discounted reward. 3. The method of claim 1 , wherein the calculating comprises: solving, by a dynamic programming or Monte Carlo method, an equation representing the expected long-term reward for the topic transition. 4. The method of claim 1 , wherein the policy base includes occurrence probability of the topic transition in the corpus, the generating comprising: converting from the expected long-term reward to probability by using a softmax function; and merging the occurrence probability of the policy base and the probability converted from the expected long-term reward. 5. The method of claim 1 , wherein the method further comprises: by using the policy and the expected long-term reward for the topic transition as initial states, personalizing the policy for a specific user based on temporal difference learning with user environment. 6. The method of claim 1 , wherein the method further comprises: selecting the destination topic as the associative topic for the current topic using the policy; observing a positive or negative actual response from user environment to obtain a user-provided reward; and updating the expected long-term reward and the policy by using the user-provided reward. 7. The method of claim 6 , wherein the updating comprises: estimating a temporal difference error defined by the user-provided reward, a current version of the expected long-term reward and a discounted long-term reward received from selection of a subsequent topic; and adjusting the expected long-term reward by the temporal difference error with a learning rate. 8. The method of claim 1 , wherein the obtaining comprises: counting an appearance of one or more positive expressions having dependency to the destination topic in the corpus; and estimating probability of appearance of any one of the one or more positive expressions using a count of the appearance, the probability of the appearance being used as the short-term reward for the topic transition to the destination topic. 9. The method of claim 1 , wherein the obtaining comprises: counting an appearance of one or more positive expressions having dependency to the destination topic in the corpus; estimating probability of appearance of any one of the one or more positive expressions using a count of the appearance; and weighting the probability of the appearance by distance between the source topic and the destination topic, the probability weighted by the distance being used as the short-term reward for the topic transition from the source topic to the destination topic. 10. The method of claim 1 , wherein the obtaining comprises: counting an appearance of the destination topic around the source topic in the corpus; and estimating occurrence probability of the topic transition using a count of the appearance, the policy base including the occurrence probability of the topic transition. 11. The method of claim 1 , wherei

Assignees

Inventors

Classifications

  • G09B5/06Primary

    with both visual and audible presentation of the material to be studied · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11574550B2 cover?
A computer-implemented method for learning a policy for selection of an associative topic, which can be used in a dialog system, is described. The method includes obtaining a policy base that indicates a topic transition from a source topic to a destination topic and a short-term reward for the topic transition, by analyzing data from a corpus. The short-term reward may be defined as probabilit…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G09B5/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).