What technology area does this patent fall under?

Primary CPC classification G06F16/215. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for enriching and normalizing data

US12019596B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12019596-B2
Application number	US-202318097053-A
Country	US
Kind code	B2
Filing date	Jan 13, 2023
Priority date	Feb 18, 2022
Publication date	Jun 25, 2024
Grant date	Jun 25, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An integrated platform system that employ a series of machine learning techniques and prediction and detection units that can process input data and extract and generate meaningful insights and predictions therefrom. The system integrates together multiple different data storage types and applications that generates data of different types, and an associated processing system for processing the different data types, store the data in a common data model to normalize the data, determine the data lineage of the data, and then process the data using different types of techniques. The data can also be processed by a prediction unit for generating meaningful insights and predictions or by an anomaly detection unit for detecting one or more anomalies in the data.

First claim

Opening claim text (preview).

The invention claimed is: 1. A data aggregation and normalization system for enriching and normalizing data, comprising a plurality of data sources for providing data that is generated by a plurality of different types of data systems that are managed by different types of software applications, a data extraction unit for extracting selected portions of the data from the plurality of data sources to form extracted data, a data storage unit for storing the extracted data, a data preprocessing and enrichment unit for processing and enriching the extracted data to form cleaned data that is stored in the data storage unit, wherein the data preprocessing and enrichment unit includes a data cleaning unit for cleaning the extracted unit to form cleaned data, a common data model unit for inserting the cleaned data into a common data model to normalize the cleaned data, and an assessment unit for assessing a quality of the cleaned data in the common data model, and a machine language module having a plurality of predefined machine learning units for applying one or more selected machine learning techniques to selected portions of the cleaned data to form machine language data, wherein the cleaned data includes transaction data, product data, and user data, wherein the machine language module further comprises a prediction unit for processing the transaction data and the user data and generating a prediction based on an interest in one or more selected products of a selected user, wherein the prediction unit is configured to generate a first product interest score indicative of a first interest level in the product by the selected user, a second product interest score indicative of a second interest level in the product by the selected user, a community interest score associated with a community interest in the one or more selected products, a user feature score associated with one or more primary user features of the selected product, and a product feature score indicative of one or more primary features of the selected product, and to determine therefrom a final product score indicative of the user interest in the one or more selected products, and a ranking unit for ranking the final product interest scores. 2. The system of claim 1 , wherein the prediction unit comprises a filter unit for processing the transaction data and the user data and for generating the product interest score indicative of the interest in the one or more selected products by the selected user, wherein the filter unit includes a pattern filter unit for identifying from the transactional data a set of users having similar product preferences to the selected user and for generating based thereon a first product interest score indicative of a first interest level in the product by the selected user, and a neuro pattern filter unit for identifying from the transactional data and the user data a set of users having similar product preferences to the selected user and for generating based thereon a second product interest score indicative of a second interest level in the product by the selected user. 3. The system of claim 2 , wherein the prediction unit further comprises a page rank unit for processing the product data and the user data and for generating therefrom the community interest score associated with the one or more selected products, a user feature extraction unit for processing the user data and for identifying and extracting one or more primary user features based on the user data having the user feature score associated therewith, and a product feature extraction unit for processing the product data and for identifying and extracting one or more primary product features based on the product data having the product feature score associated therewith. 4. The system of claim 3 , wherein the prediction unit further comprises a scoring unit for receiving and processing the first product interest score, the second product interest score, the community interest score, the user feature score, and the product feature score to determine therefrom the final product score indicative of the user interest in the one or more selected products. 5. The system of claim 4 , further comprising a data feedback loop for reintroducing to one or more of the plurality of data sources the transformed data for subsequent processing by the data preprocessing and enrichment unit. 6. The system of claim 4 , wherein the community interest score generated by the page rank unit is based on a number of web links directed to one or more web pages listing the one or more selected products. 7. The system of claim 6 , wherein the user feature extraction unit employs a principal component analysis technique to determine the one or more primary user features, and wherein the product feature extraction unit employs a principal component analysis technique to determine the one or more primary product features. 8. The system of claim 1 , wherein the machine language module further comprises an anomaly detection unit for detecting one or more anomalies in the cleaned data by segmenting the cleaned data into a plurality of data segments, by determining entropy values associated with each of the plurality of data segments, and by determining a change in the entropy values. 9. The system of claim 8 , wherein the anomaly detection unit comprises a segmentation unit for segmenting the cleaned data into the plurality of data segments, an entropy determination unit for determining the entropy values for each of the plurality of data segments and for determining a plurality of distributions of the entropy values, an entropy change determination unit for comparing each of the plurality of distributions of the entropy values with each of the remaining ones of the plurality of distributions of the entropy values and for determining therefrom the change in the entropy value of each of the plurality of data segments relative to each other to form a plurality of distributions of entropy change values, an entropy selection unit for analyzing and selecting one or more distributions of entropy change values that trend in an upward direction, wherein the entropy change values correspond to one or more anomalies, and a removal unit for identifying selected ones of the plurality of distributions of entropy change values that are identical to each other, clustering together the identical ones of the plurality of distributions of entropy change values, and then removing duplicates of the identical ones of the plurality of distributions of entropy change values. 10. The system of claim 9 , wherein the segmented data is arranged in a hierarchical manner, and wherein the change in the entropy value can be determined by employing a K-L divergence technique. 11. The system of claim 2 , wherein the data preprocessing and enrichment unit further comprises a data lineage unit for determining a lineage of selected portions of the cleaned data. 12. The system of claim 2 , further comprising a transformation unit for transforming the machine language data into a selected reporting format, and a reporting unit for generating one or more reports from the data in the reporting format. 13. A method for enriching and normalizing data from a plurality of different types of data systems that are managed by different types of software applications, comprising extracting with a data extraction unit selected portions of data from a plurality of data sources to form extracted data, wherein the plurality of data sources provides data that is generated by a plurality of different types of data systems that are managed by different types of software

Assignees

Kpmg Llp

Classifications

G06F16/254
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
G06F16/2379
Updates performed during online database operations; commit processing · CPC title
G06F16/215Primary
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
G06F16/9538Primary
Presentation of query results · CPC title

Patent family

Related publications grouped by family.

View patent family 84922750

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12019596B2 cover?: An integrated platform system that employ a series of machine learning techniques and prediction and detection units that can process input data and extract and generate meaningful insights and predictions therefrom. The system integrates together multiple different data storage types and applications that generates data of different types, and an associated processing system for processing the…
Who is the assignee on this patent?: Kpmg Llp
What technology area does this patent fall under?: Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Adaptively enhancing procurement data

Systems and methods for data storage and processing

Digital Assistant Extension Automatic Ranking and Selection

Recommending contents using a base profile

Knowledge Graph Generator Enabled by Diagonal Search

Frequently asked questions