Multimodal entity identification
US-11475254-B1 · Oct 18, 2022 · US
US12014142B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12014142-B2 |
| Application number | US-202117354825-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 22, 2021 |
| Priority date | Jun 22, 2021 |
| Publication date | Jun 18, 2024 |
| Grant date | Jun 18, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented process for training a natural language processing (NLP) agent having a reinforced learning model includes the following operations. A type of document from a document corpus is identified using metadata particularly associated with the document. The NLP agent tokenizes the document to generate a plurality of tokens. Using a schema identified from the type of the document, one of the plurality of tokens is compared to a system of record (SOR) field from the schema. A similarity score between the one of the plurality of tokens with a correct value and a reward based upon the similarity score are generated. A determination is made that an optimum minimum average similarity rate has not been obtained. Based upon the determination, the reinforced learning model is trained using a loss function that includes the reward.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for training a natural language processing (NLP) agent having a reinforced learning model, comprising: identifying, using metadata particularly associated with a document from a document corpus, a type of the document; tokenizing, by the NLP agent, the document to generate a plurality of tokens; comparing one of the plurality of tokens to a system of record (SOR) field from a schema identified from the type of the document; generating a similarity score between the one of the plurality of tokens with a correct value and a reward based upon the similarity score; determining that an optimum minimum average similarity rate has not been obtained; and training, based upon the determining, the reinforced learning model using a loss function that includes the reward. 2. The method of claim 1 , wherein the comparing and the generating are performed for all of a plurality of SOR fields in the schema. 3. The method of claim 2 , wherein the determining is based upon similarity scores for all of the plurality of SOR fields. 4. The method of claim 1 , wherein the reinforced learning model is a Deep Q Network. 5. The method of claim 1 , wherein the similarity score use a Levenshtein distance. 6. The method of claim 1 , wherein after a determination is made that optimum minimum average similarity rate has been obtained, the training is repeated using another document from the document corpus. 7. The method of claim 1 , wherein the tokenizing includes identifying a respective location for each of the plurality of tokens. 8. The method of claim 1 , wherein the NLP agent includes a machine learning engine. 9. A computer hardware system for training a natural language processing (NLP) agent having a reinforced learning model, comprising: a hardware processor configured to perform the following executable operations: identifying, using metadata particularly associated with a document from a document corpus, a type of the document; tokenizing, by the NLP agent, the document to generate a plurality of tokens; comparing one of the plurality of tokens to a system of record (SOR) field from a schema identified from the type of the document; generating a similarity score between the one of the plurality of tokens with a correct value and a reward based upon the similarity score; determining that an optimum minimum average similarity rate has not been obtained; and training, based upon the determining, the reinforced learning model using a loss function that includes the reward. 10. The system of claim 9 , wherein the comparing and the generating are performed for all of a plurality of SOR fields in the schema. 11. The system of claim 10 , wherein the determining is based upon similarity scores for all of the plurality of SOR fields. 12. The system of claim 9 , wherein the reinforced learning model is a Deep Q Network. 13. The system of claim 9 , wherein the similarity score use a Levenshtein distance. 14. The system of claim 9 , wherein after a determination is made that optimum minimum average similarity rate has been obtained, the training is repeated using another document from the document corpus. 15. The system of claim 9 , wherein the tokenizing includes identifying a respective location for each of the plurality of tokens. 16. The system of claim 9 , wherein the NLP agent includes a machine learning engine. 17. A computer program product, comprising: a computer readable storage medium having stored therein program code for training a natural language processing (NLP) agent having a reinforced learning model, the program code, which when executed by a computer hardware system, cause the computer hardware system to perform: identifying, using metadata particularly associated with a document from a document corpus, a type of the document; tokenizing, by the NLP agent, the document to generate a plurality of tokens; comparing one of the plurality of tokens to a system of record (SOR) field from a schema identified from the type of the document; generating a similarity score between the one of the plurality of tokens with a correct value and a reward based upon the similarity score; determining that an optimum minimum average similarity rate has not been obtained; and training, based upon the determining, the reinforced learning model using a loss function that includes the reward. 18. The computer program product of claim 17 , wherein the comparing and the generating are performed for all of a plurality of SOR fields in the schema; and the determining is based upon similarity scores for all of the plurality of SOR fields. 19. The computer program product of claim 17 , wherein the reinforced learning model is a Deep Q Network, and the NLP agent includes a machine learning engine. 20. The computer program product of claim 17 , wherein after a determination is made that optimum minimum average similarity rate has been obtained, the training is repeated using another document from the document corpus.
Feedforward networks · CPC title
Reinforcement learning · CPC title
Matching criteria, e.g. proximity measures · CPC title
Machine learning · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.