Methods and systems to train classification models to classify conversations

US10409913B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10409913-B2
Application numberUS-201514872258-A
CountryUS
Kind codeB2
Filing dateOct 1, 2015
Priority dateOct 1, 2015
Publication dateSep 10, 2019
Grant dateSep 10, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for training a conversation-classification model are disclosed. A first set of conversations in a source domain and a second set of conversation in a target domain are received. Each of the first set of conversations has an associated predetermined tag. One or more features are extracted from the first set of conversations and from the second set of conversations. Based on the similarity of content in the first set of conversations and the second set of conversations, a first weight is assigned to each conversation of the first set of conversations. Further, a second weight is assigned to the one or more features of the first set of conversations based on the similarity of the one or more features of the first set of conversations and of the second set of conversations. A conversation-classification model is trained based on the first weight and the second weight.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for training a conversation classification model, the method comprising: receiving, by a transceiver, a first set of conversations corresponding to a source domain and a second set of conversations corresponding to a target domain, wherein each conversation in the first set of conversations has one or more predetermined tags, wherein at least one of the one or more predetermined tags corresponds to a status of the first set of conversations, wherein the source domain corresponds to a first technical or business field for which the one or more predetermined tags are associated and the target domain correspond to a second technical or business field, different from the first technical or business field, for which tags are not associated, and wherein each conversation in the first set of conversations and each conversation in the second set of conversations comprises an audio conversation; generating, by one or more processors, a transcript for each conversation in the first set of conversations and a transcript for each conversation in the second set of conversations based on a speech-to-text conversion technique; extracting, by the one or more processors, one or more features from the transcript of each of the first set of conversations and the second set of conversations; assigning, by the one or more processors, a first weight to each conversation in the first set of conversations based on at least a similarity between content of the first set of conversations and content of the second set of conversations, wherein the similarity of the content is determined based on the one or more features extracted from the transcripts of the first set of conversations and the second set of conversations, and based on a ratio defined as: ∑ i = 1 ⁢ ( P i ⁡ ( x | d = target ) P i ⁡ ( x | d = source ) ) where: P i (x|d=target) corresponds to a probability that a conversation x corresponds to a target domain, and P i (x|d=source) corresponds to a probability that a conversation x corresponds to a source domain, wherein a Euclidian distance is determined between the value of the one or more features extracted from the transcript of the first set of conversations and the value of the one or more features extracted from the transcript of the second set of conversations, and wherein a similarity is identified between the one or more features extracted from the transcript of the first set of conversations and the one or more features extracted from the transcript of the second set of conversations, based on the determined Euclidian distance; assigning, by the one or more processors, a second weight to each of the one or more features associated with the first set of conversations based on a similarity between the one or more features extracted from the transcript of the first set of conversations and the one or more features extracted from the transcript of the second set of conversations; training, by the one or more processors, the conversation classification model based on at least the first weight and the second weight, wherein the conversation classification model is capable of assigning the one or more predetermined tags to the second set of conversations; applying new conversations in the second set of conversations corresponding to the target domain to the conversation classification model; and assigning, automatically, the at least one of the one or more predetermined tags to the new conversations based on a result of the application of the new conversations to the conversation classification model. 2. The method of claim 1 , wherein the first set of conversations and the second set of conversations comprises text conversations. 3. The method of claim 1 , further comprising identifying, by the one or more processors, one or more conversations from the first set of conversations based on the determined similarity between the first set of conversations and the second set of conversations, wherein a value of the first weight assigned to the one or more conversations is higher in comparison to the first weight assigned to other conversations in the first set of conversations. 4. The method of claim 1 , wherein the one or more features comprise at least a count of n-gram words in a conversation, a position of a segment in a thread, a position of a segment in a message, a sender of a message, an email of said sender, a count of letters in uppercase, a count of punctuations in the conversation, a measure of positive sentiment, and a measure of a negative sentiment. 5. The method of claim 1 , wherein the second weight is assigned to the one or more features such that a feature of the first set of conversations similar to a feature of the second set of conversations is assigned a higher value in comparison to other features in the one or more features of the first set of conversations. 6. The method of claim 1 , wherein the at least one of the one or more predetermined tags corresponds to at least one of an open category, a solved category, a closed category or a change channel category. 7. The method of claim 1 , wherein each conversation of the second set of conversations of the target domain is not assigned the one or more predetermined tags. 8. The method of claim 1 , further comprising transmitting, by the one or more processors, a notification to a first user in the conversation, through a user-computing device, based on the at least one of the one or more predetermined tags to the new conversations in the second set of conversations, wherein the notification corresponds to a recommendation of an action to be performed by the first user. 9. A system for training a conversation classification model, said system comprising: a transceiver configured to: receive a first set of conversations corresponding to a source domain and a second set of conversations corresponding to a target domain, wherein each conversation in the first set of conversations has one or more predetermined tags, wherein at least one of the one or more predetermined tags corresponds to a status of the first set of conversations, wherein the source domain corresponds to a first technical or business field for which the one or more predetermined tags are associated and the target domain correspond to a second technical or business field, different from the first technical or

Assignees

Inventors

Classifications

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Clustering; Classification · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10409913B2 cover?
Methods and systems for training a conversation-classification model are disclosed. A first set of conversations in a source domain and a second set of conversation in a target domain are received. Each of the first set of conversations has an associated predetermined tag. One or more features are extracted from the first set of conversations and from the second set of conversations. Based on t…
Who is the assignee on this patent?
Conduent Business Services Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 10 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).