Fixed string dictionary
US-11010415-B2 · May 18, 2021 · US
US12326852B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12326852-B2 |
| Application number | US-202117239900-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 26, 2021 |
| Priority date | Apr 26, 2021 |
| Publication date | Jun 10, 2025 |
| Grant date | Jun 10, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and computer program products for identifying anomalous transformations using lineage data are provided herein. A computer-implemented method includes generating a set of column profiles for a corresponding set of columns within one or more datasets based at least in part on lineage data and glossary data, wherein the lineage data comprises information related to transformations performed on each column in the set by a computing platform, and wherein the glossary data comprises information related to one or more terms assigned to one or more of the columns; obtaining information related to a new transformation involving at least one column in the set of columns; comparing the new transformation to the set of column profiles to determine whether the new transformation is anomalous; and in response to determining the new transformation is anomalous, outputting an alert to a user of the computing platform.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: tracking, by a computing platform, lineage data associated with database transformations executed on a set of columns within a plurality of datasets associated with the computing platform; generating, by the computing platform, a set of column profiles corresponding to the set of columns based at least in part on the lineage data, wherein the column profile for a given column comprises at least one of derivation information of the given column and usage information of the given column; analyzing, by the computing platform, glossary data to identify two or more columns in the set of columns that are related, wherein the glossary data comprises semantic information related to one or more common terms assigned to the two or more columns in the set of columns, and wherein a first column of the two or more columns corresponds to a first dataset of the plurality of datasets and a second column of the two or more columns corresponds to a second dataset of the plurality of datasets; enriching, by the computing platform, the lineage data of the column profiles corresponding to the two or more related columns, wherein the enriching comprises aggregating database transformations associated with the two or more related columns and consolidating the column profiles corresponding to the first column and the second column into a single column profile; obtaining, by the computing platform, information related to a new database transformation involving at least one column in the set of columns; determining, by the computing platform, whether the new database transformation is anomalous based at least in part on a comparison of the new database transformation to the set of column profiles and a data quality analysis associated with one or more database transformations identified in the enriched lineage data that are similar to the new transformation, and wherein the comparison comprises determining that the at least one column involved in the new database transformation is related to the two or more related columns based at least in part on the consolidated single column profile, extracting the aggregated database transformations from the enriched lineage data, and comparing the aggregated database transformations with the new database transformation; outputting, by the computing platform, an alert to a user of the computing platform comprising information that indicates the new database transformation is anomalous; and updating, by the computing platform, the set of column profiles based on a classification of the new database transformation provided as feedback from the user in response to the alert; wherein the method is carried out by at least one computing device. 2. The computer-implemented method of claim 1 , wherein the generating comprises: determining one or more patterns in the usage data based on the usage information, wherein the one or more patterns are based on at least one of: one or more operators of database transformations involving the given column; one or more operands of database transformations involving the given column; and an order of the one or more operators and/or the one or more operands. 3. The computer-implemented method of claim 1 , wherein the column profile for a given column comprises at least one of: information indicating whether a name of the given column is based on a name of at least one of the other columns in the set of columns; information indicating at least one other column in the set of columns that has been involved in a database transformation with the given column; information categorizing database transformations performed on the given column based on the number of columns involved in the database transformations; and expressions of the database transformation performed on the given column. 4. The computer-implemented method of claim 1 , wherein the determining comprises: analyzing the lineage data to identify one or more constraints associated with one or more of the columns in the set; and determining whether data resulting from the new database transformation violates at least one of the constraints. 5. The computer-implemented method of claim 1 , wherein the alert is output to a graphical user interface and the information comprises at least one of: an explanation of why the new database transformation is anomalous; and a user interface element for the user to provide the feedback on the new database transformation. 6. The computer-implemented method of claim 1 , wherein software is provided as a service in a cloud environment. 7. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device to cause the computing device to: track, by a computing platform, lineage data associated with database transformations executed on a set of columns within a plurality of datasets associated with the computing platform; generate a set of column profiles corresponding to the set of columns based at least in part on the lineage data, wherein the column profile for a given column comprises at least one of derivation information of the given column and usage information of the given column; analyze, by the computing platform, glossary data to identify two or more columns in the set of columns that are related, wherein the glossary data comprises semantic information related to one or more common terms assigned to the two or more columns in the set of columns, and wherein a first column of the two or more columns corresponds to a first dataset of the plurality of datasets and a second column of the two or more columns corresponds to a second dataset of the plurality of datasets; enrich, by the computing platform, the lineage data of the column profiles corresponding to the two or more related columns, wherein the enriching comprises aggregating database transformations associated with the two or more related columns and consolidating the column profiles corresponding to the first column and the second column into a single column profile; obtain, by the computing platform, information related to a new database transformation involving at least one column in the set of columns; determine, by the computing platform, whether the new database transformation is anomalous based at least in part on a comparison of the new database transformation to the set of column profiles and a data quality analysis associated with one or more database transformations identified in the enriched lineage data that are similar to the new transformation, and wherein the comparison comprises determining that the at least one column involved in the new database transformation is related to the two or more related columns based at least in part on the consolidated single column profile, extracting the aggregated database transformations from the enriched lineage data, and comparing the aggregated database transformations with the new database transformation; output, by the computing platform, an alert to a user of the computing platform comprising information that indicates the new database transformation is anomalous; and update, by the computing platform, the set of column profiles based on a classification of the new database transformation provided as feedback from the user in response to the alert. 8. The computer program product of claim 7 , wherein the generating comprises: determining one or more patterns in the usage data based on the usage information, wherein the one or more patterns are based on at least one of: one or more operators of database transformations involving the given column; one or more operands of database transformations invol
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
Change logging, detection, and notification (replication G06F16/27) · CPC title
Column-oriented storage; Management thereof · CPC title
Clustering or classification · CPC title
Ensuring data consistency and integrity · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.