Methods and systems to train classification models to classify conversations

US2017098443A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017098443-A1
Application numberUS-201514872258-A
CountryUS
Kind codeA1
Filing dateOct 1, 2015
Priority dateOct 1, 2015
Publication dateApr 6, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for training a conversation-classification model are disclosed. A first set of conversations in a source domain and a second set of conversation in a target domain are received. Each of the first set of conversations has an associated predetermined tag. One or more features are extracted from the first set of conversations and from the second set of conversations. Based on the similarity of content in the first set of conversations and the second set of conversations, a first weight is assigned to each conversation of the first set of conversations. Further, a second weight is assigned to the one or more features of the first set of conversations based on the similarity of the one or more features of the first set of conversations and of the second set of conversations. A conversation-classification model is trained based on the first weight and the second weight.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for training a conversation classification model, said method comprising: receiving, by a transceiver, a first set of conversations corresponding to a source domain and a second set of conversations corresponding to a target domain, wherein each conversation in said first set of conversations has an associated predetermined tag, and wherein each conversation in the first set of conversations and the second set of conversations corresponds to an audio conversation; generating, by one or more processors, a transcript for each conversation in the first set of conversations and the second set of conversations based on a speech to text conversion technique; extracting, by the one or more processors, one or more features from the transcript of each of said first set of conversations and said second set of conversations; assigning, by the one or more processors, a first weight to each conversation in said first set of conversations based on at least a similarity between content of said first set of conversations and content of said second set of conversations, wherein the similarity of the content is determined based on the one or more features; assigning, by the one or more processors, a second weight to each of said one or more features associated said first set of conversations based on a similarity between said one or more features extracted from the transcript of said first set of conversations and said one or more features extracted from the transcript of said second set of conversations; and training, by the one or more processors, said conversation classification model based on at least said first weight and said second weight, wherein said conversation classification model is capable of assigning said predetermined tag to said second set of conversations. 2 . The method of claim 1 , wherein the first set of conversations and the second set of conversations corresponds to text conversations. 3 . The method of claim 1 , further comprising identifying, by said one or more processors, one or more conversations from said first set of conversations based on said determined similarity between said first set of conversations and said second set of conversations, wherein value of said first weight assigned to said one or more conversations is higher in comparison to said first weight assigned to other conversations in said first set of conversations. 4 . The method of claim 1 , wherein said one or more features comprise at least a count of n-gram words in a conversation, a position of a segment in a thread, a position of a segment in a message, a sender of a message, an email of said sender, a count of letters in uppercase, a count of punctuations in said conversation, a measure of positive sentiment, and a measure of a negative sentiment. 5 . The method of claim 1 , wherein said second weight is assigned to said one or more features such that a feature of said first set of conversations similar to a feature of said second set of conversations is assigned a higher value in comparison to other features in said one or more features of said first set of conversations. 6 . The method of claim 1 , wherein said predetermined tag corresponds to at least one of an open category, a solved category, a closed category or a change channel category. 7 . The method of claim 1 , wherein each conversation of said second set of conversations of said target domain does not have said associated predetermined tag. 8 . The method of claim 1 further comprising transmitting, by the one or more processors, a notification to a first user in the conversation, through a user-computing device, based on the predetermined tag assigned to the second set of conversations, wherein the notification may correspond to a recommendation of an action to be performed by the first user. 9 . A system for training a conversation classification model, said system comprising: a transceiver configured to receive a first set of conversations corresponding to a source domain and a second set of conversations corresponding to a target domain, wherein each conversation in said first set of conversations has an associated predetermined tag, and wherein each conversation in the first set of conversations and the second set of conversations corresponds to an audio conversation; and one or more processors configured to: generate, by one or more processors, a transcript for each conversation in the first set of conversations and the second set of conversations based on a speech to text conversion technique; extract one or more features from the transcript of each of said first set of conversations and said second set of conversations, assign a first weight to each conversation in said first set of conversations based on at least a similarity between content of said first set of conversations and content of said second set of conversations, wherein the similarity of the content is determined based on the one or more features, assign a second weight to each of said one or more features associated said first set of conversations based on a similarity between said one or more features extracted from the transcript of said first set of conversations and said one or more features extracted from the transcript of said second set of conversations, and train said conversation classification model based on at least said first weight and said second weight, wherein said conversation classification model is capable of assigning said predetermined tag to said second set of conversations. 10 . The system of claim 9 , wherein the first set of conversations and the second set of conversations corresponds to text conversations. 11 . The system of claim 10 , wherein said one or more processors are further configured to identify one or more conversations from said first set of conversations based on said determined similarity between said first set of conversations and said second set of conversations, wherein value of said first weight assigned to said one or more conversations is higher in comparison to said first weight assigned to other conversations in said first set of conversations. 12 . The system of claim 9 , wherein said one or more features comprise at least a count of n-gram words in a conversation, a position of a segment in a thread, a position of a segment in a message, a sender of a message, an email of said sender, a count of letters in uppercase, a count of punctuations in said conversation, a measure of positive sentiment, and a measure of a negative sentiment. 13 . The system of claim 9 , wherein said second weight is assigned to said one or more features such that a feature of said first set of conversations similar to a feature of said second set of conversations is assigned a higher value in comparison to other features in said one or more features of said first set of conversations. 14 . The system of claim 9 , wherein said predetermined tag corresponds to at least one of an open category, a solved category, a closed category or a change channel category. 15 . The system of claim 9 , wherein each conversation of said second set of conversations of said target domain does not have said associated predetermined tag. 16 . A computer program product for use with a computing device, the computer program product comprising a non-transitory computer readable medium, the non-transitory computer readable medium stores a computer program code for training a conversation classification model, the computer program code is executable by one or more processors in the computing device to: receive a first

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Clustering; Classification · CPC title

  • G10L15/10Primary

    using distance or distortion measures between unknown speech and reference templates · CPC title

  • Probabilistic grammars, e.g. word n-grams · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017098443A1 cover?
Methods and systems for training a conversation-classification model are disclosed. A first set of conversations in a source domain and a second set of conversation in a target domain are received. Each of the first set of conversations has an associated predetermined tag. One or more features are extracted from the first set of conversations and from the second set of conversations. Based on t…
Who is the assignee on this patent?
Xerox Corp
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 06 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).