Systems and methods for distributing intent models
US-10452782-B1 · Oct 22, 2019 · US
US11756553B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11756553-B2 |
| Application number | US-202017023535-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 17, 2020 |
| Priority date | Sep 17, 2020 |
| Publication date | Sep 12, 2023 |
| Grant date | Sep 12, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an approach for training data enhancement for an interactive response system, a processor retrieves a set of training data including a set of intents, a set of entities, and a set of utterances that map to each intent. A processor determines iteratively a root verb among the set of utterances for each intent. A processor to determine a set of new intents based on analysis of the determined root verb by performing a pairwise iteration and similarity score over the set of intents. A processor determines iteratively one or more new entities for each new intent. A processor generates a set of new training data based on the set of new intents and entities.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: updating, by one or more processors, a training for a chatbot with new training data derived from a set of training data, wherein updating comprises: retrieving, by one or more processors, the set of training data from the chatbot, wherein the set of training data includes a set of intents, a set of entities, and a set of utterances that map to each intent; determining iteratively, by one or more processors, a root verb among the set of utterances for each intent, comprising: responsive to the root verb failing to exceed a verb frequency threshold, identifying, by one or more computer processors, one or more embedding vectors for the root verb based on a computed cosine similarity between the root verb and one or more previous root verbs, wherein the computed cosine similarity a measure of similarity between two or more embedding vectors of an inner product space; and determining, by one or more computer processors, a second root verb associated with a highest sum of embedding vectors; determining, by one or more processors, a set of new intents based on analysis of the second root verb by performing a pairwise iteration and similarity score over the set of intents; determining iteratively, by one or more processors, one or more new entities for each new intent; and generating, by one or more processors, the new training data based on the set of new intents and new entities. 2. The computer-implemented method of claim 1 , wherein determining the root verb comprises: performing part of speech tagging of each utterance; stemming every verb in each utterance; and declaring the root verb based on a frequency score over a pre-defined threshold. 3. The computer-implemented method of claim 1 , wherein the similarity score is based on computing similarity between each intent's root verb. 4. The computer-implemented method of claim 1 , wherein generating the set of new intents comprises: forming a new intent by determining the similarity score among two or more existing intents exceeds a pre-defined threshold; associating utterances mapped to the two or more existing intents with the similarity score to the new intent; removing the two or more existing intents with the similarity score from the set of new intents; and responsive to completion of the pairwise iteration, keeping a remaining intents to the set of new intents. 5. The computer-implemented method of claim 1 , wherein determining the one or more new entities for each new intent comprises: removing pronouns from the set of utterances for each new intent; building a frequency score for each noun based on how often the respective noun appears in the set of utterances for each new intent; and determining a noun with a frequency score above a pre-defined threshold as an entity for the new intent. 6. The computer-implemented method of claim 1 , wherein determining the one or more new entities for each new intent comprises adding a common characteristic that an appearance rate is above a pre-defined threshold as an entity for the new intent. 7. The computer-implemented method of claim 1 , further comprising: performing, by one or more processors, a test of the set of training data; performing, by one or more processors, a test of the set of new training data; and comparing, by one or more processors, the testing result. 8. A computer program product comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to update training for a chatbot with new training data derived from a set of training data, wherein updating comprises: program instructions to retrieve the set of training data from the chatbot, wherein the set of training data includes a set of intents, a set of entities, and a set of utterances that map to each intent; program instructions to retrieve a set of training data including a set of intents, a set of entities, and a set of utterances that map to each intent; program instructions to determine iteratively a root verb among the set of utterances for each intent, wherein the program instructions comprise: program instructions to, responsive to the root verb failing to exceed a verb frequency threshold, identify one or more embedding vectors for the root verb based on a computed cosine similarity between the root verb and one or more previous root verbs, wherein the computed cosine similarity a measure of similarity between two or more embedding vectors of an inner product space; and program instructions to determine a second root verb associated with a highest sum of embedding vectors; program instructions to determine a set of new intents based on analysis of the second root verb by performing a pairwise iteration and similarity score over the set of intents; program instructions to determine iteratively one or more new entities for each new intent; and program instructions to generate a set of new training data based on the set of new intents and new entities. 9. The computer program product of claim 8 , wherein program instructions to determine iteratively a root verb comprise: program instructions to perform part of speech tagging of each utterance; program instructions to stem every verb in each utterance; and program instructions to declare the root verb based on a frequency score over a pre-defined threshold. 10. The computer program product of claim 8 , wherein the similarity score is based on computing similarity between each intent's root verb. 11. The computer program product of claim 8 , wherein program instructions to generate the set of new intents comprise: program instructions to form a new intent by determining the similarity score among two or more existing intents exceeds a pre-defined threshold; program instructions to associate utterances mapped to the two or more existing intents with the similarity score to the new intent; program instructions to remove the two or more existing intents with the similarity score from the set of new intents; and program instructions, responsive to completion of the pairwise iteration, to keep a remaining intents to the set of new intents. 12. The computer program product of claim 8 , wherein program instructions to determine the one or more new entities for each new intent comprise: program instructions to remove pronouns from the set of utterances for each new intent; program instructions to build a frequency score for each noun based on how often the respective noun appears in the set of utterances for each new intent; and program instructions to determine a noun with a frequency score above a pre-defined threshold as an entity for the new intent. 13. The computer program product of claim 8 , wherein program instructions to determine the one or more new entities for each new intent comprise program instructions to add a common characteristic that an appearance rate is above a pre-defined threshold as an entity for the new intent. 14. The computer program product of claim 8 , further comprising: program instructions, stored on the one or more computer-readable storage media, to perform a test of the set of training data; program instructions, stored on the one or more computer-readable storage media, to perform a test of the set of new training data; and program instructions, stored on the one or more computer-readable storage media, to compare the testing result. 15. A computer system comprising: one or more computer pro
Related publications grouped by family.
Answers are generated from the same data shown on this page.