System and method for predictive structuring of electronic data

US11892989B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11892989-B2
Application numberUS-202217705448-A
CountryUS
Kind codeB2
Filing dateMar 28, 2022
Priority dateMar 28, 2022
Publication dateFeb 6, 2024
Grant dateFeb 6, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the invention are directed to a system, method, or computer program product for an approach to predictive structuring of electronic data. The system uses data mining and data clustering techniques using classification models to organize feature column groups comprising feature columns. The system identifies and flags feature column groups and/or feature columns based on regulatory data standards provided by regulatory bodies. Thereafter, data objects are imported into the system and prediction algorithms are implemented to characterize the feature columns containing the data objects.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for organizing data objects, the system comprising: a memory device with computer-readable program code stored thereon; a communication device; a processing device operatively coupled to the memory device and the communication device, wherein the processing device is configured to execute the computer-readable program code to: receive entity data tables, wherein the entity data tables comprise data objects stored in a memory device of an entity system; receive regulatory standard documentation comprising data handling guidelines identified from keywords in text of the regulatory standard documentation, wherein the regulatory standard documentation is a pdf file; mine metadata from the entity data tables and mine the data handling guidelines from the regulatory standard documentation; perform clustering on mined data using a K-means algorithm to determine a common grouping of the mined data to prepare for identification of a feature column group, wherein the mined data comprises the mined data handling guidelines and the mined metadata, and wherein the feature column group comprises one or more feature columns determined by the K-means algorithm to relate to other feature columns within the feature column group based on the metadata and data handling guidelines of the data objects within each of the feature columns; import unknown data objects to a cell of a column; identify the feature column of the column using a machine learning model trained using training data objects with known feature columns and known feature columns groups; identify the feature column group based on data type and sensitivity; and improve a confidence interval of the identified feature column of the feature column group by subjecting the data objects to the machine learning model multiple times. 2. The system of claim 1 , wherein the processing device is further configured to execute the computer-readable program code to: assign a related target key to data objects extending across multiple feature column groups; store the related target key in a relationship key table; append the relationship key table to each of the feature column groups; and output predictively structured data in one or more visual formats. 3. The system of claim 1 , wherein performing clustering on the mined entity data tables is performed iteratively. 4. The system of claim 1 , wherein the machine learning model further comprises classifying and identifying the data objects or the one or more feature columns as numeric or non-numeric, and wherein a Naïve Bayes algorithm is used for non-numeric, and a decision tree algorithm is used for numeric. 5. The system of claim 1 , wherein identifying the feature column comprises determining a name of the feature column. 6. The system of claim 1 , wherein the processing device is further configured to execute the computer-readable program code to: flag the feature column group based on the data type and the sensitivity. 7. The system of claim 1 , wherein the processing device is further configured to execute the computer-readable program code to: flag the feature column based on the data type and the sensitivity. 8. A computer program product for organizing data objects, the computer program product comprising at least one non-transitory computer-readable medium having computer-readable program code portions embodied therein, the computer-readable program code portions comprising: receiving entity data tables, wherein the entity data tables comprise data objects stored in a memory device of an entity system; receiving regulatory standard documentation comprising data handling guidelines in text identified from keywords in text of the regulatory standard documentation, wherein the regulatory standard documentation is a pdf file; mining metadata from the entity data tables and mining the data handling guidelines from the regulatory standard documentation; performing clustering on mined data using a K-means algorithm to determine a common grouping of the mined data to prepare for identification of a feature column group, wherein the mined data comprises the mined data handling guidelines and the mined metadata, and wherein the feature column group comprises one or more feature columns determined by the K-means algorithm to relate to other feature columns within the feature column group based on the metadata and data handling guidelines of the data objects within each of the feature columns; importing unknown data objects to a cell of a column; identifying the feature column of the column using a machine learning model trained using training data objects with known feature columns and known feature columns groups; identifying the feature column group based on data type and sensitivity; and improving a confidence interval of the identified feature column of the feature column group by subjecting the data objects to the machine learning model multiple times. 9. The computer program product of claim 8 , the computer-readable program code portion further comprising: assigning a related target key to data objects extending across multiple feature column groups; storing the related target key in a relationship key table; appending the relationship key table to each of the feature column groups; and outputting predictively structured data in one or more visual formats. 10. The computer program product of claim 8 , wherein performing clustering on the mined entity data tables is performed iteratively. 11. The computer program product of claim 8 , wherein the machine learning model further comprises classifying and identifying the data objects or the one or more feature columns as numeric or non-numeric, and wherein a Naïve Bayes algorithm is used for non-numeric, and a decision tree algorithm is used for numeric. 12. A computer-implemented method for organizing data objects, the method comprising: providing a computing system comprising a computer processing device and a non-transitory computer readable medium, where the non-transitory computer readable medium comprises configured computer program instruction code, such that when said computer program instruction code is operated by said computer processing device, said computer processing device performs the following operations: receiving entity data tables, wherein the entity data tables comprise data objects stored in a memory device of an entity system; receiving regulatory standard documentation comprising data handling guidelines in text identified from keywords in text of the regulatory standard documentation, wherein the regulatory standard documentation is a pdf file; mining metadata from the entity data tables and mining the data handling guidelines from the regulatory standard documentation; performing clustering on mined data using a K-means algorithm to determine a common grouping of the mined data to prepare for identification of a feature column group, wherein the mined data comprises the mined data handling guidelines and the mined metadata, and wherein the feature column group comprises one or more feature columns determined by the K-means algorithm to relate to other feature columns within the feature column group based on the metadata and data handling guidelines of the data objects within each of the feature columns; importing unknown data objects to a cell of a column; identifying the feature column of the column using a machine learning model trained using training data objects with known feature columns and known feature columns groups; identifying the feature column group based on data type and sensitivity; and improving a confidence interval of the

Assignees

Inventors

Classifications

  • G06F16/213Primary

    with details for schema evolution support · CPC title

  • Query processing support for facilitating data mining operations in structured databases · CPC title

  • Data format conversion from or to a database · CPC title

  • Clustering or classification · CPC title

  • Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11892989B2 cover?
Embodiments of the invention are directed to a system, method, or computer program product for an approach to predictive structuring of electronic data. The system uses data mining and data clustering techniques using classification models to organize feature column groups comprising feature columns. The system identifies and flags feature column groups and/or feature columns based on regulator…
Who is the assignee on this patent?
Bank Of America
What technology area does this patent fall under?
Primary CPC classification G06F16/213. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).