What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Active learning for concept disambiguation

US11636376B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11636376-B2
Application number	US-201815996491-A
Country	US
Kind code	B2
Filing date	Jun 3, 2018
Priority date	Jun 3, 2018
Publication date	Apr 25, 2023
Grant date	Apr 25, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer system, and a computer program product for active machine learning is provided. The present invention may include annotating a plurality of data entries. The present invention may also include building a first dataset based on the annotated plurality of data entries. The present invention may then include receiving user feedback based on the built first dataset. The present invention may further include assigning a plurality of weights to a plurality of data entry subsets. The present invention may also include generating a second weighted dataset based on the received user feedback.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating ground truth using active machine learning, the method comprising: annotating a plurality of data entries using rule-based natural language processing; parsing the plurality of data entries into a first dataset that includes entities, features, and classifications; building, using a bootstrap aggregation, the first dataset based on the annotated plurality of data entries using coreference resolution and entity analysis, wherein each row in the first dataset represents an annotation from the annotated plurality of data entries; receiving user feedback based on the built first dataset in response to detecting an ambiguity associated with a data entry in the built first dataset, wherein the ambiguity indicates the data entry comprises more than one meaning; assigning a plurality of weights to a plurality of data entry subsets; generating a second weighted dataset that is weighted higher than the first dataset because the second weighted dataset is based on the received user feedback; and transmitting the second weighted dataset to create a trained model. 2. The method of claim 1 , wherein the plurality of data entries are derived from a source, and wherein the source is selected from a group consisting of a database, a corpus, a knowledgebase or an individual. 3. The method of claim 1 , wherein the second weighted dataset includes the plurality of data entry subsets that create a machine learning model. 4. The method of claim 1 , wherein the first dataset is data obtained by a domain specific logic and natural language processing of the plurality of data entries. 5. The method of claim 1 , wherein the user feedback is created by a subject matter expert (SME) based on rule-based logic applied to the first dataset. 6. The method of claim 1 , wherein the second weighted dataset is ground truth data that include features selected from a group consisting of an ambiguous entity, an entity characteristic, an NLP trigger, an NLP trigger distance, an NLP parse tree characteristic and a plurality of a parts of speech tag. 7. The method of claim 1 , wherein the higher weight is a more accurate plurality of data. 8. A computer system for active machine learning, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage media, and program instructions stored on at least one of the one or more computer-readable tangible storage media for execution by at least one of the one or more processors via at least one of the one or more computer-readable memories, wherein the computer system is capable of performing a method comprising: annotating a plurality of data entries using rule-based natural language processing; parsing the plurality of data entries into a first dataset that includes entities, features, and classifications; building, using a bootstrap aggregation, the first dataset based on the annotated plurality of data entries using coreference resolution and entity analysis, wherein each row in the first dataset represents an annotation from the annotated plurality of data entries; receiving user feedback based on the built first dataset in response to detecting an ambiguity associated with a data entry in the built first dataset, wherein the ambiguity indicates the data entry comprises more than one meaning; assigning a plurality of weights to a plurality of data entry subsets; generating a second weighted dataset that is weighted higher than the first dataset because the second weighted dataset is based on the received user feedback; and transmitting the second weighted dataset to create a trained model. 9. The computer system of claim 8 , wherein the plurality of data entries are derived from a source, and wherein the source is selected from a group consisting of a database, a corpus, a knowledgebase or an individual. 10. The computer system of claim 8 , wherein the second weighted dataset includes the plurality of data entry subsets that create a machine learning model. 11. The computer system of claim 8 , wherein the first dataset is data obtained by a domain specific logic and natural language processing of the plurality of data entries. 12. The computer system of claim 8 , wherein the user feedback is created by a subject matter expert (SME) based on rule-based logic applied to the first dataset. 13. The computer system of claim 8 , wherein the second weighted dataset is ground truth data that include features selected from a group consisting of an ambiguous entity, an entity characteristic, an NLP trigger, an NLP trigger distance, an NLP parse tree characteristic and a plurality of a parts of speech tag. 14. The computer system of claim 8 , wherein the higher weight is a more accurate plurality of data. 15. A computer program product for active machine learning, comprising: one or more computer-readable tangible storage media and program instructions stored on at least one of the one or more computer-readable tangible storage media, the program instructions executable by a processor to cause the processor to perform a method comprising: annotating a plurality of data entries using rule-based natural language processing; parsing the plurality of data entries into a first dataset that includes entities, features, and classifications; building, using a bootstrap aggregation, the first dataset based on the annotated plurality of data entries using coreference resolution and entity analysis, wherein each row in the first dataset represents an annotation from the annotated plurality of data entries; receiving user feedback based on the built first dataset in response to detecting an ambiguity associated with a data entry in the built first dataset, wherein the ambiguity indicates the data entry comprises more than one meaning; assigning a plurality of weights to a plurality of data entry subsets; generating a second weighted dataset that is weighted higher than the first dataset because the second weighted dataset is based on the received user feedback; and transmitting the second weighted dataset to create a trained model. 16. The computer program product of claim 15 , wherein the plurality of data entries are derived from a source, and wherein the source is selected from a group consisting of a database, a corpus, a knowledgebase or an individual. 17. The computer program product of claim 15 , wherein the second weighted dataset includes the plurality of data entry subsets that create a machine learning model. 18. The computer program product of claim 15 , wherein the first dataset is data obtained by a domain specific logic and natural language processing of the plurality of data entries. 19. The computer program product of claim 15 , wherein the user feedback is created by a subject matter expert (SME) based on rule-based logic applied to the first dataset. 20. The computer program product of claim 15 , wherein the second weighted dataset is ground truth data that include features selected from a group consisting of an ambiguous entity, an entity characteristic, an NLP trigger, an NLP trigger distance, an NLP parse tree characteristic and a plurality of a parts of speech tag.

Assignees

Inventors

Classifications

G06F40/40
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
G06N20/00Primary
Machine learning · CPC title
G06N5/025
Extracting rules from data · CPC title
G06F40/20
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
G06F40/169Primary
Annotation, e.g. comment data or footnotes · CPC title

Patent family

Related publications grouped by family.

View patent family 68693066

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11636376B2 cover?: A method, computer system, and a computer program product for active machine learning is provided. The present invention may include annotating a plurality of data entries. The present invention may also include building a first dataset based on the annotated plurality of data entries. The present invention may then include receiving user feedback based on the built first dataset. The present i…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).