Extract, transform, load monitoring platform

US2022374442A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022374442-A1
Application numberUS-202117303167-A
CountryUS
Kind codeA1
Filing dateMay 21, 2021
Priority dateMay 21, 2021
Publication dateNov 24, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some implementations, a monitoring device may receive configuration information associated with an extract, transform, load (ETL) pipeline that includes one or more data sources and one or more data sinks. The monitoring device may generate, based on the configuration information, lineage data related to a data flow from the one or more data sources to the one or more data sinks in the ETL pipeline. The monitoring device may generate one or more predicted quality metrics associated with the ETL pipeline using a machine learning model. The monitoring device may generate a visualization in which multiple nodes are arranged to indicate the data flow from the one or more data sources to the one or more data sinks and further in which the one or more predicted quality metrics are encoded within the visualization.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for monitoring an extract, transform, load (ETL) pipeline, comprising: one or more memories; and one or more processors, coupled to the one or more memories, configured to: receive configuration information associated with the ETL pipeline that includes one or more data sources and one or more data sinks, wherein the configuration information indicates data records to be extracted from the one or more data sources, transformed from a source format to a target format, and loaded into the one or more data sinks; generate, based on the configuration information, lineage data related to a data flow from the one or more data sources to the one or more data sinks in the ETL pipeline; generate one or more predicted quality metrics associated with the ETL pipeline using a machine learning model, wherein the machine learning model is trained using historical execution data associated with one or more ETL jobs; and generate a visualization in which multiple nodes are arranged to indicate the data flow from the one or more data sources to the one or more data sinks and further in which the one or more predicted quality metrics are encoded within the visualization. 2 . The system of claim 1 , wherein the multiple nodes represent one or more source tables storing the data records to be extracted, transformed, and loaded, one or more target tables into which the data records are to be loaded, and one or more intermediate source tables in the data flow from the one or more source tables to the one or more target tables. 3 . The system of claim 2 , wherein the lineage data includes dependencies among the one or more source tables, the one or more intermediate source tables, and the one or more target tables. 4 . The system of claim 2 , wherein the multiple nodes are arranged across multiple columns and the visualization includes user interface elements linking the multiple nodes to indicate the data flow from the one or more source tables to the one or more target tables. 5 . The system of claim 4 , wherein the multiple nodes and the user interface elements linking the multiple nodes are each depicted in the visualization using a color in a color palette. 6 . The system of claim 2 , wherein the one or more predicted quality metrics relate to one or more of a timeliness, a service level agreement, or an accuracy associated with an ETL task configured to process the data records in the one or more source tables, the one or more intermediate source tables, or the one or more target tables. 7 . The system of claim 1 , wherein the one or more processors are further configured to: detect, using the machine learning model, a failure or an anomaly in the data flow from the one or more data sources to the one or more data sinks in the ETL pipeline; and cause one or more of the multiple nodes in the visualization to be depicted using one or more colors to indicate a portion of the data flow affected by the failure or the anomaly. 8 . The system of claim 7 , wherein the one or more processors are further configured to: terminate an ETL task associated with the ETL pipeline based on the failure or the anomaly in the data flow. 9 . The system of claim 7 , wherein the one or more processors are further configured to: send a message to one or more users based on the failure or the anomaly in the data flow, wherein the message includes information related to the failure or the anomaly in the data flow and information related to one or more suggested actions to remediate the failure or the anomaly in the data flow. 10 . The system of claim 1 , wherein the one or more predicted quality metrics are encoded within the visualization such that information related to the one or more predicted quality metrics are depicted in the visualization based on interaction with one or more user interface elements. 11 . A method for visualizing information related to an extract, transform, load (ETL) pipeline, comprising: receiving, by an ETL monitoring device, configuration information associated with the ETL pipeline that includes one or more data sources and one or more data sinks, wherein the configuration information indicates data records to be extracted from the one or more data sources, transformed from a source format to a target format, and loaded into the one or more data sinks; generating, by the ETL monitoring device, based on the configuration information, lineage data related to a data flow from the one or more data sources to the one or more data sinks in the ETL pipeline; and generating, by the ETL monitoring device, based on the lineage data, a visualization including multiple nodes that are linked by user interface elements to indicate the data flow from the one or more data sources to the one or more data sinks. 12 . The method of claim 11 , wherein the multiple nodes represent one or more source tables storing the data records to be extracted, transformed, and loaded, one or more target tables into which the data records are to be loaded, and one or more intermediate source tables in the data flow from the one or more source tables to the one or more target tables. 13 . The method of claim 11 , further comprising: generating one or more predicted quality metrics associated with the ETL pipeline using a machine learning model that is trained using historical execution data associated with one or more ETL jobs; and configuring the visualization to indicate the one or more predicted quality metrics by depicting one or more of the multiple nodes or one or more of the user interface elements linking the multiple nodes using a color in a color palette. 14 . The method of claim 11 , further comprising: generating one or more predicted quality metrics associated with the ETL pipeline using a machine learning model that is trained using historical execution data associated with one or more ETL jobs; and configuring the visualization to depict information related to the one or more predicted quality metrics based on interaction with one or more of the multiple nodes or the user interface elements linking the multiple nodes. 15 . The method of claim 11 , further comprising: detecting a failure or an anomaly in the data flow from the one or more data sources to the one or more data sinks in the ETL pipeline; and performing one or more actions based on the failure or the anomaly in the data flow, wherein performing the one or more actions includes one or more of: causing one or more of the multiple nodes or the user interface elements linking the multiple nodes to be depicted in the visualization using one or more colors to indicate a portion of the data flow affected by the failure or the anomaly, terminating an ETL task associated with the ETL pipeline, or generating a message that includes information related to the failure or the anomaly in the data flow and information related to one or more suggested actions to remediate the failure or the anomaly in the data flow. 16 . A non-transitory computer-readable medium storing a set of instructions, the set of instructions comprising: one or more instructions that, when executed by one or more processors of an extract, transform, load (ETL) monitoring device, cause the ETL monitoring device to: generate lineage data related to a data flow from one or more data sources to one or more data sinks in an ETL pipeline; detect a failure or an anomaly in the ETL pipeline using a machine learning model that is trained using historical execution data associated with one or more ETL jobs; and gener

Assignees

Inventors

Classifications

  • Visual data mining; Browsing structured data · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • Machine learning · CPC title

  • Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022374442A1 cover?
In some implementations, a monitoring device may receive configuration information associated with an extract, transform, load (ETL) pipeline that includes one or more data sources and one or more data sinks. The monitoring device may generate, based on the configuration information, lineage data related to a data flow from the one or more data sources to the one or more data sinks in the ETL p…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 24 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).