Systems and methods for machine learning based rule discovery for data migration

US12118478B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12118478-B2
Application numberUS-202016998674-A
CountryUS
Kind codeB2
Filing dateAug 20, 2020
Priority dateMay 8, 2020
Publication dateOct 15, 2024
Grant dateOct 15, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for deriving classification rules from documents and a database using rule-based machine learning. The method includes extracting first variables from documents corresponding to an organization. The method further includes extracting second variables from a database corresponding to the organization. The method also includes filtering the extracted second variables based on at least one of null values, repeat variables, location variables, ID variables, or data variables. The method further includes deriving first classification rules based on the first variables using a rule-based machine learning algorithm. The method also includes calculating an accuracy of the derived first classification rules. The method also includes deriving second classification rules based on the first variables and the filtered second variables. The method further includes determining a suggested additional variable based on the derived second classification rules and the calculated accuracy.

First claim

Opening claim text (preview).

What is claimed: 1. A method for deriving classification rules from documents and a database using rule-based machine learning, the method comprising: extracting, by a server computing device, a first plurality of variables from documents corresponding to an organization; extracting, by the server computing device, a second plurality of variables from a database corresponding to the organization; filtering, by the server computing device, the extracted second plurality of variables based on at least one of null values, repeat variables, location variables, ID variables, or date variables; deriving, by the server computing device, a first plurality of classification rules based on the first plurality of variables using a rule-based machine learning algorithm, comprising: a) executing the rule-based machine learning algorithm using the first plurality of variables as input to derive a classification rule and identify a subset of the first plurality of variables that satisfy the classification rule, b) removing the identified subset of variables from the first plurality of variables, c) repeating steps a) and b) using the remaining first plurality of variables as input to identify additional subsets of the remaining variables that satisfy additional classification rules, and d) storing the classification rules derived by the rule-based machine learning algorithm in step a) as the first plurality of classification rules; calculating, by the server computing device, an accuracy of the derived first plurality of classification rules; deriving, by the server computing device, a second plurality of classification rules based on the first plurality of variables and filtered second plurality of variables using the rule-based machine learning algorithm, comprising: e) executing the rule-based machine learning algorithm using the first plurality of variables and one of the filtered second plurality of variables as input to derive a classification rule and identify a subset of the first plurality of variables and one of the filtered second plurality of variables that satisfy the classification rule, f) removing the identified subset of variables from the first plurality of variables and the one of the filtered second plurality of variables, g) updating the calculated accuracy of the derived first plurality of classification rules based upon the first plurality of variables and the one of the filtered second plurality of variables, h) repeating steps e), f) and g) using the remaining first plurality of variables and another one of the filtered second plurality of variables as input to identify additional subsets of the remaining variables that satisfy additional classification rules, and i) storing the classification rules generated by the rule-based machine learning algorithm in step e) as the second plurality of classification rules; and identifying, by the server computing device, a suggested additional variable from the filtered second plurality of variables for inclusion in the first plurality of variables based upon the updated accuracy of the derived first plurality of classification rules; and generating, by the server computing device, for display the derived first plurality of classification rules, the derived second plurality of classification rules, the updated accuracy of the derived first plurality of classification rules, and the suggested additional variable. 2. The method of claim 1 , wherein the server computing device is configured to calculate the accuracy of the derived first plurality of classification rules based on a known plurality of classification rules corresponding to the organization. 3. The method of claim 1 , wherein the server computing device is further configured to extract the first plurality of variables using natural language processing. 4. The method of claim 1 , wherein the database comprises demographic data, employment data, and benefit plan data. 5. The method of claim 1 , wherein the server computing device is further configured to map the extracted first plurality of variables to corresponding entries of the database. 6. The method of claim 1 , wherein the server computing device is further configured to classify each of the extracted first plurality of variables and second plurality of variables as character-based or numeric. 7. The method of claim 1 , wherein the first plurality of classification rules are derived sequentially using the rule-based machine learning algorithm. 8. The method of claim 1 , wherein the server computing device is further configured to sequentially derive the second plurality of classification rules based on the first plurality of variables and the filtered second plurality of variables using the rule-based machine learning algorithm. 9. The method of claim 8 , wherein the server computing device is further configured to calculate an accuracy of the derived second plurality of classification rules based on a known plurality of classification rules corresponding to the organization. 10. A system for deriving classification rules from documents and a database using rule-based machine learning, the system comprising: a server computing device communicatively coupled to a database corresponding to an organization and a display device, the server computing device configured to: extract a first plurality of variables from documents corresponding to an organization; extract a second plurality of variables from the database corresponding to the organization; filter the extracted second plurality of variables based on at least one of null values, repeat variables, location variables, ID variables, or date variables; derive a first plurality of classification rules based on the first plurality of variables using a rule-based machine learning algorithm, comprising: a) executing the rule-based machine learning algorithm using the first plurality of variables as input to derive a classification rule and identify a subset of the first plurality of variables that satisfy the classification rule, b) removing the identified subset of variables from the first plurality of variables, c) repeating steps a) and b) using the remaining first plurality of variables as input to identify additional subsets of the remaining variables that satisfy additional classification rules, and d) storing the classification rules derived by the rule-based machine learning algorithm in step a) as the first plurality of classification rules; calculate an accuracy of the derived first plurality of classification rules; derive a second plurality of classification rules based on the first plurality of variables and filtered second plurality of variables using a rule-based machine learning algorithm, comprising: e) executing the rule-based machine learning algorithm using the first plurality of variables and one of the filtered second plurality of variables as input to derive a classification rule and identify a subset of the first plurality of variables and one of the filtered second plurality of variables that satisfy the classification rule, f) removing the identified subset of variables from the first plurality of variables and the one of the filtered second plurality of variables, g) updating the calculated accuracy of the derived first plurality of classification rules based upon the first plurality of variables and the one of the filtered second plurality of variables, h) repeating steps e), f) and g) using the remaining first plurality of variables and another one of the filtered second plurality of variables as input to identify additional subsets of the remaining variables that satisfy additional classification rules, and i) storing the classification rules g

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • Document management systems · CPC title

  • Recognition of textual entities · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Extracting rules from data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12118478B2 cover?
Systems and methods for deriving classification rules from documents and a database using rule-based machine learning. The method includes extracting first variables from documents corresponding to an organization. The method further includes extracting second variables from a database corresponding to the organization. The method also includes filtering the extracted second variables based on …
Who is the assignee on this patent?
Fmr Llc
What technology area does this patent fall under?
Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 15 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).