Clinical concept identification, extraction, and prediction system and related methods
US-10957433-B2 · Mar 23, 2021 · US
US11892989B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11892989-B2 |
| Application number | US-202217705448-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 28, 2022 |
| Priority date | Mar 28, 2022 |
| Publication date | Feb 6, 2024 |
| Grant date | Feb 6, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the invention are directed to a system, method, or computer program product for an approach to predictive structuring of electronic data. The system uses data mining and data clustering techniques using classification models to organize feature column groups comprising feature columns. The system identifies and flags feature column groups and/or feature columns based on regulatory data standards provided by regulatory bodies. Thereafter, data objects are imported into the system and prediction algorithms are implemented to characterize the feature columns containing the data objects.
Opening claim text (preview).
What is claimed is: 1. A system for organizing data objects, the system comprising: a memory device with computer-readable program code stored thereon; a communication device; a processing device operatively coupled to the memory device and the communication device, wherein the processing device is configured to execute the computer-readable program code to: receive entity data tables, wherein the entity data tables comprise data objects stored in a memory device of an entity system; receive regulatory standard documentation comprising data handling guidelines identified from keywords in text of the regulatory standard documentation, wherein the regulatory standard documentation is a pdf file; mine metadata from the entity data tables and mine the data handling guidelines from the regulatory standard documentation; perform clustering on mined data using a K-means algorithm to determine a common grouping of the mined data to prepare for identification of a feature column group, wherein the mined data comprises the mined data handling guidelines and the mined metadata, and wherein the feature column group comprises one or more feature columns determined by the K-means algorithm to relate to other feature columns within the feature column group based on the metadata and data handling guidelines of the data objects within each of the feature columns; import unknown data objects to a cell of a column; identify the feature column of the column using a machine learning model trained using training data objects with known feature columns and known feature columns groups; identify the feature column group based on data type and sensitivity; and improve a confidence interval of the identified feature column of the feature column group by subjecting the data objects to the machine learning model multiple times. 2. The system of claim 1 , wherein the processing device is further configured to execute the computer-readable program code to: assign a related target key to data objects extending across multiple feature column groups; store the related target key in a relationship key table; append the relationship key table to each of the feature column groups; and output predictively structured data in one or more visual formats. 3. The system of claim 1 , wherein performing clustering on the mined entity data tables is performed iteratively. 4. The system of claim 1 , wherein the machine learning model further comprises classifying and identifying the data objects or the one or more feature columns as numeric or non-numeric, and wherein a Naïve Bayes algorithm is used for non-numeric, and a decision tree algorithm is used for numeric. 5. The system of claim 1 , wherein identifying the feature column comprises determining a name of the feature column. 6. The system of claim 1 , wherein the processing device is further configured to execute the computer-readable program code to: flag the feature column group based on the data type and the sensitivity. 7. The system of claim 1 , wherein the processing device is further configured to execute the computer-readable program code to: flag the feature column based on the data type and the sensitivity. 8. A computer program product for organizing data objects, the computer program product comprising at least one non-transitory computer-readable medium having computer-readable program code portions embodied therein, the computer-readable program code portions comprising: receiving entity data tables, wherein the entity data tables comprise data objects stored in a memory device of an entity system; receiving regulatory standard documentation comprising data handling guidelines in text identified from keywords in text of the regulatory standard documentation, wherein the regulatory standard documentation is a pdf file; mining metadata from the entity data tables and mining the data handling guidelines from the regulatory standard documentation; performing clustering on mined data using a K-means algorithm to determine a common grouping of the mined data to prepare for identification of a feature column group, wherein the mined data comprises the mined data handling guidelines and the mined metadata, and wherein the feature column group comprises one or more feature columns determined by the K-means algorithm to relate to other feature columns within the feature column group based on the metadata and data handling guidelines of the data objects within each of the feature columns; importing unknown data objects to a cell of a column; identifying the feature column of the column using a machine learning model trained using training data objects with known feature columns and known feature columns groups; identifying the feature column group based on data type and sensitivity; and improving a confidence interval of the identified feature column of the feature column group by subjecting the data objects to the machine learning model multiple times. 9. The computer program product of claim 8 , the computer-readable program code portion further comprising: assigning a related target key to data objects extending across multiple feature column groups; storing the related target key in a relationship key table; appending the relationship key table to each of the feature column groups; and outputting predictively structured data in one or more visual formats. 10. The computer program product of claim 8 , wherein performing clustering on the mined entity data tables is performed iteratively. 11. The computer program product of claim 8 , wherein the machine learning model further comprises classifying and identifying the data objects or the one or more feature columns as numeric or non-numeric, and wherein a Naïve Bayes algorithm is used for non-numeric, and a decision tree algorithm is used for numeric. 12. A computer-implemented method for organizing data objects, the method comprising: providing a computing system comprising a computer processing device and a non-transitory computer readable medium, where the non-transitory computer readable medium comprises configured computer program instruction code, such that when said computer program instruction code is operated by said computer processing device, said computer processing device performs the following operations: receiving entity data tables, wherein the entity data tables comprise data objects stored in a memory device of an entity system; receiving regulatory standard documentation comprising data handling guidelines in text identified from keywords in text of the regulatory standard documentation, wherein the regulatory standard documentation is a pdf file; mining metadata from the entity data tables and mining the data handling guidelines from the regulatory standard documentation; performing clustering on mined data using a K-means algorithm to determine a common grouping of the mined data to prepare for identification of a feature column group, wherein the mined data comprises the mined data handling guidelines and the mined metadata, and wherein the feature column group comprises one or more feature columns determined by the K-means algorithm to relate to other feature columns within the feature column group based on the metadata and data handling guidelines of the data objects within each of the feature columns; importing unknown data objects to a cell of a column; identifying the feature column of the column using a machine learning model trained using training data objects with known feature columns and known feature columns groups; identifying the feature column group based on data type and sensitivity; and improving a confidence interval of the
with details for schema evolution support · CPC title
Query processing support for facilitating data mining operations in structured databases · CPC title
Data format conversion from or to a database · CPC title
Clustering or classification · CPC title
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.