Metric for automatic assessment of conversational responses

US9967211B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9967211-B2
Application numberUS-201514726569-A
CountryUS
Kind codeB2
Filing dateMay 31, 2015
Priority dateMay 31, 2015
Publication dateMay 8, 2018
Grant dateMay 8, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples are generally directed towards automatic assessment of machine generated conversational responses. Context-message-response n-tuples are extracted from at least one source of conversational data to generate a set of multi-reference responses. A response in the set of multi-reference responses includes it context-message data pair and rating. The rating indicates a quality of the response relative to the context-message data pair. A response assessment engine generates a metric score for a machine-generated response based on an assessment metric and the set of multi-reference responses. The metric score indicates a quality of the machine-generated conversational response relative to a user-generated message and a context of the user-generated message. A response generation system of a computing device, such as a digital assistant, is optimized and adjusted based on the metric score to improve the accuracy, quality, and relevance of responses output to the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for automatic assessment of machine generated responses, said method comprising: extracting candidate context-message-response n-tuples, by an extraction component of a computing device, from at least one source of conversational data; forming a set of multi-reference responses selected from the candidate context-message-response n-tuples extracted by the extraction component; calculating an assessment metric for the machine generated response, by at least one processor, based on the set of multi-reference responses; and generating a metric score for the machine generated response based on the assessment metric, by the at least one processor, the metric score indicating a quality of the machine-generated response relative to the set of multi-reference responses. 2. The computer-implemented method of claim 1 , wherein extracting candidate context-message-response n-tuples from at least one source of conversational data and forming a set of multi-reference responses further comprises: extracting candidate context-message-response n tuples from the at least one source of conversational data, wherein individual candidate context-message-response n-tuples comprise a human-generated message, a conversational context, and a reference response corresponding to the human-generated message. 3. The computer-implemented method of claim 2 , further comprising: selecting a response from the extracted candidate context-message-response n tuples based on a context of a message associated with the response to form a reference response in the set of multi-reference responses, wherein a message associated with the reference response corresponds to the selected human-generated message. 4. The computer-implemented method of claim 2 , further comprising: selecting a response from the extracted candidate context-message-response n-tuples based on conversational context of the response to form a reference response in the set of multi-reference responses, wherein the conversational context associated with the reference response corresponds to the conversational context of the machine-generated response. 5. The computer-implemented method of claim 4 , wherein a conversational context of a message comprises linguistic context data and non-linguistic context data, wherein the linguistic context data comprises message-response data pairs preceding the selected message and the selected machine-generated response in a conversation. 6. The computer-implemented method of claim 2 , further comprising: extracting the candidate context-message-response n-tuples from the at least one source of conversational data via a network connection, wherein the at least one source of conversational data is at least one of a social media source, wherein the social media source provides conversational data in at least one format, wherein a format of conversational data comprises a text format, an audio format, or a visual format. 7. The computer-implemented method of claim 1 , wherein a rating of individual multi-reference responses in the set of multi-reference responses is a human-generated rating, and further comprising: accessing the rating of the individual multi-reference responses in the set of multi-reference responses, wherein the rating indicates a quality of the individual multi-references responses relative to a reference multi-reference response. 8. The computer-implemented method of claim 1 , further comprising: determining a rating for individual multi-reference responses in the set of multi-reference responses is a rating on a scale other than a negative one to positive one scale, normalizing the rating to form a normalized rating within a range from negative one to positive one. 9. The computer-implemented method of claim 1 , wherein the set of multi-reference responses is a test set of multi-reference responses, and further comprising: training the response assessment engine based on a training set of multi-reference context-response-message n-tuples extracted from the at least one source of conversational data, wherein training the response assessment engine further comprises calculating the assessment metric based on the training set of multi-reference context-message-response n-tuples to train a set of weights associated with the response assessment engine. 10. The computer-implemented method of claim 1 , wherein the metric score is a score within a scale from zero to one, and wherein generating the metric score further comprises: calculating an amount of word sequence overlap between the machine-generated response and a reference response in the set of multi-reference responses, wherein an overlap of zero indicates no words in common between the machine-generated response and the reference response, and wherein an overlap of one indicates the machine-generated response is identical to the reference response. 11. The computer-implemented method of claim 10 , further comprising: on determining an overlap between the machine-generated response and the references response, determining a rating of the reference response; increasing the metric score on determining the rating of the reference response is a positive rating; and decreasing the metric score on determining the rating of the reference response is a negative rating. 12. A system for automatic assessment of machine generated responses, said system comprising: at least one processor; and a memory storage device associated with the at least one processor, the memory storage device comprising a memory area storing a response assessment engine, wherein the at least one processor executes the response assessment engine to: calculate an assessment metric for at least one machine-generated response, based on a set of multi-reference responses, a set of ratings and contextual data being associated with the set of multi-reference responses; generate at least one metric score indicating a quality of the at least one machine-generated response relative to at least one multi-reference response from the set of multi-reference responses; and update a set of parameters associated with the response generation system based on the at least one metric score. 13. The system of claim 12 , wherein the metric score is a score within a scale from zero to one, and wherein the at least one processor further executes the response assessment engine to: calculate an amount of word sequence overlap between the machine-generated response and a reference response in the set of multi-reference responses, wherein an overlap of zero indicates no words in common between the machine-generated response and the reference response, and wherein an overlap of one indicates the machine-generated response is identical to the reference response. 14. The system of claim 12 , wherein the at least one processor further executes the response assessment engine to: identify an amount of overlap between the machine-generated response and a reference response; increase a metric score of the machine-generated response on determining a rating of the reference response is a positive rating; and decrease the metric score of the machine-generated response on determining the rating of the reference response is a negative rating. 15. The system of claim 12 , wherein the at least one processor further executes the response assessment engine to: generate a first metric score associated with a first machine-generated response; update the set of parameters in response to the first machine-generated response to form a modified set of parameters; generate a second metr

Assignees

Inventors

Classifications

  • H04L51/02Primary

    using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages · CPC title

  • G06F40/56Primary

    Natural language generation · CPC title

  • Multimedia information · CPC title

  • Physics · mapped topic

  • Electricity · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9967211B2 cover?
Examples are generally directed towards automatic assessment of machine generated conversational responses. Context-message-response n-tuples are extracted from at least one source of conversational data to generate a set of multi-reference responses. A response in the set of multi-reference responses includes it context-message data pair and rating. The rating indicates a quality of the respon…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification H04L51/02. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue May 08 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).