Machine learning-based relationship association and related discovery and search engines
US-2019354544-A1 · Nov 21, 2019 · US
US11636376B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11636376-B2 |
| Application number | US-201815996491-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 3, 2018 |
| Priority date | Jun 3, 2018 |
| Publication date | Apr 25, 2023 |
| Grant date | Apr 25, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, computer system, and a computer program product for active machine learning is provided. The present invention may include annotating a plurality of data entries. The present invention may also include building a first dataset based on the annotated plurality of data entries. The present invention may then include receiving user feedback based on the built first dataset. The present invention may further include assigning a plurality of weights to a plurality of data entry subsets. The present invention may also include generating a second weighted dataset based on the received user feedback.
Opening claim text (preview).
What is claimed is: 1. A method for generating ground truth using active machine learning, the method comprising: annotating a plurality of data entries using rule-based natural language processing; parsing the plurality of data entries into a first dataset that includes entities, features, and classifications; building, using a bootstrap aggregation, the first dataset based on the annotated plurality of data entries using coreference resolution and entity analysis, wherein each row in the first dataset represents an annotation from the annotated plurality of data entries; receiving user feedback based on the built first dataset in response to detecting an ambiguity associated with a data entry in the built first dataset, wherein the ambiguity indicates the data entry comprises more than one meaning; assigning a plurality of weights to a plurality of data entry subsets; generating a second weighted dataset that is weighted higher than the first dataset because the second weighted dataset is based on the received user feedback; and transmitting the second weighted dataset to create a trained model. 2. The method of claim 1 , wherein the plurality of data entries are derived from a source, and wherein the source is selected from a group consisting of a database, a corpus, a knowledgebase or an individual. 3. The method of claim 1 , wherein the second weighted dataset includes the plurality of data entry subsets that create a machine learning model. 4. The method of claim 1 , wherein the first dataset is data obtained by a domain specific logic and natural language processing of the plurality of data entries. 5. The method of claim 1 , wherein the user feedback is created by a subject matter expert (SME) based on rule-based logic applied to the first dataset. 6. The method of claim 1 , wherein the second weighted dataset is ground truth data that include features selected from a group consisting of an ambiguous entity, an entity characteristic, an NLP trigger, an NLP trigger distance, an NLP parse tree characteristic and a plurality of a parts of speech tag. 7. The method of claim 1 , wherein the higher weight is a more accurate plurality of data. 8. A computer system for active machine learning, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage media, and program instructions stored on at least one of the one or more computer-readable tangible storage media for execution by at least one of the one or more processors via at least one of the one or more computer-readable memories, wherein the computer system is capable of performing a method comprising: annotating a plurality of data entries using rule-based natural language processing; parsing the plurality of data entries into a first dataset that includes entities, features, and classifications; building, using a bootstrap aggregation, the first dataset based on the annotated plurality of data entries using coreference resolution and entity analysis, wherein each row in the first dataset represents an annotation from the annotated plurality of data entries; receiving user feedback based on the built first dataset in response to detecting an ambiguity associated with a data entry in the built first dataset, wherein the ambiguity indicates the data entry comprises more than one meaning; assigning a plurality of weights to a plurality of data entry subsets; generating a second weighted dataset that is weighted higher than the first dataset because the second weighted dataset is based on the received user feedback; and transmitting the second weighted dataset to create a trained model. 9. The computer system of claim 8 , wherein the plurality of data entries are derived from a source, and wherein the source is selected from a group consisting of a database, a corpus, a knowledgebase or an individual. 10. The computer system of claim 8 , wherein the second weighted dataset includes the plurality of data entry subsets that create a machine learning model. 11. The computer system of claim 8 , wherein the first dataset is data obtained by a domain specific logic and natural language processing of the plurality of data entries. 12. The computer system of claim 8 , wherein the user feedback is created by a subject matter expert (SME) based on rule-based logic applied to the first dataset. 13. The computer system of claim 8 , wherein the second weighted dataset is ground truth data that include features selected from a group consisting of an ambiguous entity, an entity characteristic, an NLP trigger, an NLP trigger distance, an NLP parse tree characteristic and a plurality of a parts of speech tag. 14. The computer system of claim 8 , wherein the higher weight is a more accurate plurality of data. 15. A computer program product for active machine learning, comprising: one or more computer-readable tangible storage media and program instructions stored on at least one of the one or more computer-readable tangible storage media, the program instructions executable by a processor to cause the processor to perform a method comprising: annotating a plurality of data entries using rule-based natural language processing; parsing the plurality of data entries into a first dataset that includes entities, features, and classifications; building, using a bootstrap aggregation, the first dataset based on the annotated plurality of data entries using coreference resolution and entity analysis, wherein each row in the first dataset represents an annotation from the annotated plurality of data entries; receiving user feedback based on the built first dataset in response to detecting an ambiguity associated with a data entry in the built first dataset, wherein the ambiguity indicates the data entry comprises more than one meaning; assigning a plurality of weights to a plurality of data entry subsets; generating a second weighted dataset that is weighted higher than the first dataset because the second weighted dataset is based on the received user feedback; and transmitting the second weighted dataset to create a trained model. 16. The computer program product of claim 15 , wherein the plurality of data entries are derived from a source, and wherein the source is selected from a group consisting of a database, a corpus, a knowledgebase or an individual. 17. The computer program product of claim 15 , wherein the second weighted dataset includes the plurality of data entry subsets that create a machine learning model. 18. The computer program product of claim 15 , wherein the first dataset is data obtained by a domain specific logic and natural language processing of the plurality of data entries. 19. The computer program product of claim 15 , wherein the user feedback is created by a subject matter expert (SME) based on rule-based logic applied to the first dataset. 20. The computer program product of claim 15 , wherein the second weighted dataset is ground truth data that include features selected from a group consisting of an ambiguous entity, an entity characteristic, an NLP trigger, an NLP trigger distance, an NLP parse tree characteristic and a plurality of a parts of speech tag.
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Machine learning · CPC title
Extracting rules from data · CPC title
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
Annotation, e.g. comment data or footnotes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.