What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Generating training data for machine learning

US10719781B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10719781-B2
Application number	US-201715651064-A
Country	US
Kind code	B2
Filing date	Jul 17, 2017
Priority date	Jul 12, 2016
Publication date	Jul 21, 2020
Grant date	Jul 21, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method includes receiving a rule, wherein the rule includes at least one token, and receiving at least two dictionaries, wherein the at least two dictionaries include at least one general language dictionary and at least one domain-specific dictionary for a domain. The computer-implemented method further includes, for each of the at least one token, selecting at least one word at random from at least one of the at least two dictionaries and adding the at least one word to a test data line, such that the test data line includes a candidate statement conforming to the rule. The computer-implemented method further includes filtering the candidate statement based on a domain-specific model for the domain and including the candidate statement in training data provided to a machine learning model. A corresponding computer program product and computer system are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving a rule, wherein said rule comprises at least one token; receiving at least two dictionaries, wherein said at least two dictionaries comprise at least one general language dictionary and at least one domain-specific dictionary for a domain; for each of said at least one token, selecting at least one word at random from at least one of said at least two dictionaries and adding said at least one word to a test data line, such that said test data line comprises a candidate statement conforming to said rule; filtering said candidate statement based on a domain-specific model for said domain; and including said candidate statement in training data provided to a machine learning model. 2. The computer-implemented method of claim 1 , further comprising inserting at least one additional word randomly selected from at least one of said at least two dictionaries into said test data line. 3. The computer-implemented method of claim 1 , wherein filtering said candidate statement comprises discarding said candidate statement, if said candidate statement fails to meet a definition of semantically correct candidate statements for said domain, according to said domain-specific model. 4. The computer-implemented method of claim 3 , wherein said domain-specific model is based on a general corpus within said domain. 5. The computer-implemented method of claim 3 , wherein said domain-specific model is based on a corpus that excludes user-specific information. 6. The computer-implemented method of claim 3 , wherein said domain-specific model is an n-gram model of domain-specific statements. 7. The computer-implemented method of claim 3 , wherein said domain is medical diagnosis. 8. The computer-implemented method of claim 7 , wherein said domain-specific model is based on a general medical corpus. 9. The computer-implemented method of claim 7 , wherein said domain-specific model is based on a corpus that excludes medical patient records. 10. The computer-implemented method of claim 7 , wherein said domain-specific model is an n-gram model of medical diagnosis statements. 11. The computer-implemented method of claim 1 , wherein said rule is expressed using regular expressions. 12. The computer-implemented method of claim 1 , wherein said rule is expressed as a state machine. 13. The computer-implemented method of claim 1 , wherein said rule encodes engineered knowledge of a human expert.

Assignees

Inventors

Classifications

G06N20/00Primary
Machine learning · CPC title
G06N5/022
Knowledge engineering; Knowledge acquisition · CPC title
G06F40/242
Dictionaries · CPC title

Patent family

Related publications grouped by family.

View patent family 60940680

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10719781B2 cover?: A computer-implemented method includes receiving a rule, wherein the rule includes at least one token, and receiving at least two dictionaries, wherein the at least two dictionaries include at least one general language dictionary and at least one domain-specific dictionary for a domain. The computer-implemented method further includes, for each of the at least one token, selecting at least one…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).