Natural language domain corpus data set creation based on enhanced root utterances

US11664010B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11664010-B2
Application numberUS-202017088071-A
CountryUS
Kind codeB2
Filing dateNov 3, 2020
Priority dateNov 3, 2020
Publication dateMay 30, 2023
Grant dateMay 30, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating a natural language domain corpus to train a machine learning natural language understanding process. A base utterance expressing an intent and an intent profile indicating at least one of categories, keywords, concepts, sentiment, entities, or emotion of the intent are received. Machine translation translates the base utterance into a plurality of foreign language utterances and back into respective utterances in the target natural language to create a normalized utterance set. Analysis of each utterance in the normalized utterance set determines respective meta information for each such utterance. Comparison of the meta information to the intent profile determines a highest ranking matching utterance within the normalized utterance set. A set of natural language data to train a machine learning natural language understating process is created based on further natural language translations of the highest ranking matching utterance.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating a set of natural language data to train a machine learning natural language understanding process, the method comprising: receiving a base natural language utterance in a target natural language, the base natural language utterance expressing an intent; receiving, in conjunction with receiving the base natural language utterance, an intent profile comprising intent parameters indicating at least one of categories, keywords, concepts, sentiment, entities, or emotion associated with the intent; translating, by machine language translation, the base natural language utterance into a plurality of foreign language utterances with each respective foreign language utterance in the plurality of foreign language utterances being translated into a different respective foreign language within a first number of foreign languages; translating, by machine language translation, each respective foreign language utterance in the plurality of foreign language utterances into a respective normalized target language utterance in the target natural language to create a normalized utterance set; analyzing, by an automated natural language understanding process, each respective normalized target language utterance to determine respective meta information indicating respective intent parameters for each normalized target language utterance; determining a highest ranking matching utterance from among each respective normalized target language utterance based on comparing each respective meta information for each normalized target language utterance to the intent profile, wherein the comparing comprises determining the highest ranking matching utterance that has a less than exact match between the each respective meta information and the intent profile; creating a set of natural language data in the target natural language comprising a plurality of utterances obtained based on a plurality of further natural language translations of the highest ranking matching utterance, the creating the set of natural language data comprising: translating, by machine language translation, the highest ranking matching utterance into a second plurality of foreign language utterances with each respective foreign language utterance in the second plurality of foreign language utterances being translated into a different respective foreign language within a second number of foreign languages where the second number is different than the first number; and translating, by machine language translation, each respective foreign language utterance in the second plurality of foreign language utterances into a respective natural language utterance in the set of natural language data; crating a training corpus comprising the set of natural language data; and training a machine learning natural language understanding process to perform natural language classification in the target natural language with at least part of the set of natural language data. 2. The method of claim 1 , further comprising removing redundant utterances from the normalized utterance set. 3. The method of claim 1 , further comprising removing redundant utterances from the set of natural language data. 4. The method of claim 1 , wherein: the intent profile comprises a respective intent profile confidence level for at least one intent parameter in the intent parameters, the meta information comprises at least one respective determined confidence level for at least one respective determined intent parameter within the respective intent parameters for each normalized language utterance, where the at least one respective determined intent parameter corresponds to the at least one intent parameter in the intent parameters, and the determining the highest ranking utterance is further based on the at least one respective determined confidence level satisfying the respective intent profile confidence level. 5. The method of claim 1 , further comprising: selecting a testing set of data from within the set of natural language data; and refining the machine learning natural language understanding process based on processing the testing set of data with the machine learning natural language understanding process. 6. The method of claim 1 , wherein the base natural language utterance and the intent profile are received from an operator via an operator interface. 7. An apparatus for generating a set of natural language data to train a machine learning natural language understanding process, the apparatus comprising: a processor; a memory communicatively coupled to the processor; an operator interface, coupled to the processor and the memory, that when operating: receives a base natural language utterance in a target natural language, the base natural language utterance expressing an intent; and receives, in conjunction with receiving the base natural language utterance, an intent profile comprising intent parameters indicating at least one of categories, keywords, concepts, sentiment, entities, or emotion associated with the intent; a target language to intermediate language machine translation bank that, when operating, translates, by machine language translation, the base natural language utterance into a plurality of foreign language utterances with each respective foreign language utterance in the plurality of foreign language utterances being translated into a different respective foreign language, within a first number of foreign languages; an intermediate language to target language machine translation bank that, when operating, translates, by machine language translation, each respective foreign language utterance in the plurality of foreign language utterances into a respective normalized target language utterance in the target natural language to create a normalized utterance set; a cognitive enrichment service that, when operating, analyzes, by an automated natural language understanding process, each respective normalized target language utterance to determine respective meta information indicating intent parameters for each normalized target language utterance; a ranking processor that, when operating, determines a highest ranking matching utterance from among each respective normalized target language utterance based on comparing each respective meta information for each normalized target language utterance to the intent profile, wherein the comparing comprises determining the highest ranking matching utterance that has a less than exact match between the each respective meta information and the intent profile; natural language data set creation processor that, when operating: creates a set of natural language data in the target natural language comprising a plurality of utterances based on further natural language translations of the highest ranking matching utterance by at least: translating, by machine language translation, the highest ranking matching utterance into a second plurality of foreign language utterances with each respective foreign language utterance in the second plurality of foreign language utterances being translated into a different respective foreign language within a second number of foreign languages, where the second number is different than the first number; and translating, by machine language translation, each respective foreign language utterance in the second plurality of foreign language utterances into a respective natural language utterance in the set of natural language data; crating a training corpus comprising the set of natural language data; and natural language classifier machine learning based model creator that, when operating, trains a machine learning natural language understanding process to perform natural language classification in the tar

Assignees

Inventors

Classifications

  • Statistical methods, e.g. probability models · CPC title

  • G10L15/063Primary

    Training · CPC title

  • G06F40/216Primary

    using statistical methods · CPC title

  • Semantic analysis · CPC title

  • Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11664010B2 cover?
Systems and methods for generating a natural language domain corpus to train a machine learning natural language understanding process. A base utterance expressing an intent and an intent profile indicating at least one of categories, keywords, concepts, sentiment, entities, or emotion of the intent are received. Machine translation translates the base utterance into a plurality of foreign lang…
Who is the assignee on this patent?
Florida Power And Light Company, Florida Power & Light Co
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 30 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).