Real-time supervised machine learning by models configured to classify offensiveness of computer-generated natural-language text
US-2020125928-A1 · Apr 23, 2020 · US
US11664010B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11664010-B2 |
| Application number | US-202017088071-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 3, 2020 |
| Priority date | Nov 3, 2020 |
| Publication date | May 30, 2023 |
| Grant date | May 30, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for generating a natural language domain corpus to train a machine learning natural language understanding process. A base utterance expressing an intent and an intent profile indicating at least one of categories, keywords, concepts, sentiment, entities, or emotion of the intent are received. Machine translation translates the base utterance into a plurality of foreign language utterances and back into respective utterances in the target natural language to create a normalized utterance set. Analysis of each utterance in the normalized utterance set determines respective meta information for each such utterance. Comparison of the meta information to the intent profile determines a highest ranking matching utterance within the normalized utterance set. A set of natural language data to train a machine learning natural language understating process is created based on further natural language translations of the highest ranking matching utterance.
Opening claim text (preview).
What is claimed is: 1. A method for generating a set of natural language data to train a machine learning natural language understanding process, the method comprising: receiving a base natural language utterance in a target natural language, the base natural language utterance expressing an intent; receiving, in conjunction with receiving the base natural language utterance, an intent profile comprising intent parameters indicating at least one of categories, keywords, concepts, sentiment, entities, or emotion associated with the intent; translating, by machine language translation, the base natural language utterance into a plurality of foreign language utterances with each respective foreign language utterance in the plurality of foreign language utterances being translated into a different respective foreign language within a first number of foreign languages; translating, by machine language translation, each respective foreign language utterance in the plurality of foreign language utterances into a respective normalized target language utterance in the target natural language to create a normalized utterance set; analyzing, by an automated natural language understanding process, each respective normalized target language utterance to determine respective meta information indicating respective intent parameters for each normalized target language utterance; determining a highest ranking matching utterance from among each respective normalized target language utterance based on comparing each respective meta information for each normalized target language utterance to the intent profile, wherein the comparing comprises determining the highest ranking matching utterance that has a less than exact match between the each respective meta information and the intent profile; creating a set of natural language data in the target natural language comprising a plurality of utterances obtained based on a plurality of further natural language translations of the highest ranking matching utterance, the creating the set of natural language data comprising: translating, by machine language translation, the highest ranking matching utterance into a second plurality of foreign language utterances with each respective foreign language utterance in the second plurality of foreign language utterances being translated into a different respective foreign language within a second number of foreign languages where the second number is different than the first number; and translating, by machine language translation, each respective foreign language utterance in the second plurality of foreign language utterances into a respective natural language utterance in the set of natural language data; crating a training corpus comprising the set of natural language data; and training a machine learning natural language understanding process to perform natural language classification in the target natural language with at least part of the set of natural language data. 2. The method of claim 1 , further comprising removing redundant utterances from the normalized utterance set. 3. The method of claim 1 , further comprising removing redundant utterances from the set of natural language data. 4. The method of claim 1 , wherein: the intent profile comprises a respective intent profile confidence level for at least one intent parameter in the intent parameters, the meta information comprises at least one respective determined confidence level for at least one respective determined intent parameter within the respective intent parameters for each normalized language utterance, where the at least one respective determined intent parameter corresponds to the at least one intent parameter in the intent parameters, and the determining the highest ranking utterance is further based on the at least one respective determined confidence level satisfying the respective intent profile confidence level. 5. The method of claim 1 , further comprising: selecting a testing set of data from within the set of natural language data; and refining the machine learning natural language understanding process based on processing the testing set of data with the machine learning natural language understanding process. 6. The method of claim 1 , wherein the base natural language utterance and the intent profile are received from an operator via an operator interface. 7. An apparatus for generating a set of natural language data to train a machine learning natural language understanding process, the apparatus comprising: a processor; a memory communicatively coupled to the processor; an operator interface, coupled to the processor and the memory, that when operating: receives a base natural language utterance in a target natural language, the base natural language utterance expressing an intent; and receives, in conjunction with receiving the base natural language utterance, an intent profile comprising intent parameters indicating at least one of categories, keywords, concepts, sentiment, entities, or emotion associated with the intent; a target language to intermediate language machine translation bank that, when operating, translates, by machine language translation, the base natural language utterance into a plurality of foreign language utterances with each respective foreign language utterance in the plurality of foreign language utterances being translated into a different respective foreign language, within a first number of foreign languages; an intermediate language to target language machine translation bank that, when operating, translates, by machine language translation, each respective foreign language utterance in the plurality of foreign language utterances into a respective normalized target language utterance in the target natural language to create a normalized utterance set; a cognitive enrichment service that, when operating, analyzes, by an automated natural language understanding process, each respective normalized target language utterance to determine respective meta information indicating intent parameters for each normalized target language utterance; a ranking processor that, when operating, determines a highest ranking matching utterance from among each respective normalized target language utterance based on comparing each respective meta information for each normalized target language utterance to the intent profile, wherein the comparing comprises determining the highest ranking matching utterance that has a less than exact match between the each respective meta information and the intent profile; natural language data set creation processor that, when operating: creates a set of natural language data in the target natural language comprising a plurality of utterances based on further natural language translations of the highest ranking matching utterance by at least: translating, by machine language translation, the highest ranking matching utterance into a second plurality of foreign language utterances with each respective foreign language utterance in the second plurality of foreign language utterances being translated into a different respective foreign language within a second number of foreign languages, where the second number is different than the first number; and translating, by machine language translation, each respective foreign language utterance in the second plurality of foreign language utterances into a respective natural language utterance in the set of natural language data; crating a training corpus comprising the set of natural language data; and natural language classifier machine learning based model creator that, when operating, trains a machine learning natural language understanding process to perform natural language classification in the tar
Statistical methods, e.g. probability models · CPC title
Training · CPC title
using statistical methods · CPC title
Semantic analysis · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.