Query fault processing method and processing apparatus
US-10866866-B2 · Dec 15, 2020 · US
US11599539B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11599539-B2 |
| Application number | US-201916287631-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 27, 2019 |
| Priority date | Dec 26, 2018 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A logical query plan to derive a target dataset from one or more source datasets is identified. The logical query plan defines source columns of the one or more source datasets and respective target columns of the target dataset. The logical query plan is parsed to derive relationships between the source columns of the one or more source datasets and the respective target columns of the target dataset. Target column metadata is generated for a target column of the target dataset. The target column metadata reflects a derived relationship between one or more source columns and the target column and existing source column metadata of each of the one or more source columns. The target column metadata is stored for the target column of the target dataset.
Opening claim text (preview).
What is claimed is: 1. A method comprising: identifying a logical query plan to derive a target dataset from one or more source datasets, wherein the logical query plan was generated from transformation code in a first programming language and comprises a hierarchical structure expressed as a tree of nodes of logical operators, wherein the logical query plan identifies a plurality of source columns of the one or more source datasets and respective target columns of the target dataset; parsing the logical query plan to derive relationships between the plurality of source columns of the one or more source datasets and the respective target columns of the target dataset; generating target column metadata for a target column of the target dataset, the target column metadata reflecting a derived relationship derived from the logical query plan and between one or more source columns and the target column and reflecting existing source column metadata of each of the one or more source columns; and storing the target column metadata for the target column of the target dataset. 2. The method of claim 1 , further comprising: responsive to determining that the logical query plan is not available for the transformation code in the first programming language, inferring the relationships between the plurality of source columns of the one or more source datasets and the respective target columns of the target dataset. 3. The method of claim 1 , wherein parsing the logical query plan to derive the relationships between the plurality of source columns of the one or more source datasets and the respective target columns of the target dataset, comprises: finding, in the logical query plan, one or more keywords associated with one or more first logical query plan portions that each identify a source dataset of the one or more source datasets; finding, in the logical query plan, one or more keywords associated with a second logical query plan portion that identifies the plurality of source columns of the one or more source datasets; finding, in the logical query plan, one or more keywords associated with a third logical query plan portion that identifies the respective target columns of the target dataset; and finding, for each of the respective target columns of the target dataset, one or more keywords associated with a fourth logical query plan portion describing a relationship between at least one of the one or more source columns of the one or more source datasets and the respective target column of the target dataset. 4. The method of claim 3 , wherein the relationship between the at least one source column of the one or more source datasets and the respective target column of the target dataset is at least one of a mapping between a name of the one or more source columns and a name of the respective target column, a database operation performed on the one or more source columns to derive the respective target column, or a function used to calculate values of the respective target column using values of the one or more source columns. 5. The method of claim 1 , wherein generating the target column metadata for the target column of the target dataset, further comprises: determining existing lineage metadata associated with each of the one or more source columns; and providing the existing lineage metadata associated with each of the one or more source columns for inclusion with the target column metadata for the target column of the target dataset. 6. The method of claim 1 , wherein generating the target column metadata for the target column of the target dataset further comprises: identifying user comments within the existing source column metadata of each of the one or more source columns; and providing the user comments for inclusion with the target column metadata for the target column of the target dataset. 7. The method of claim 1 , further comprising: determining whether the one or more source columns are associated with a column level access control policy; and responsive to determining the one or more source columns are associated with the column level access control policy, propagating the column level access control policy to the target column of the target dataset. 8. The method of claim 7 , wherein the one or more source columns comprise at least two source columns, the method further comprising: determining that the at least two source columns are associated with a plurality of column level access control policies; and selecting one of the plurality of column level access control policies to propagate to the target column of the target dataset. 9. The method of claim 2 , wherein inferring the relationships between the plurality of source columns of the one or more source datasets and the respective target columns of the target dataset, comprises: identifying, based on a list of datasets, one or more datasets that are source dataset candidates; finding, for each of the respective target columns, one or more source column candidates from the source dataset candidates; and inferring, for each of the respective target columns, a relationship between the one or more source column candidates and a respective target column of the respective target columns of the target dataset based on values in the one or more source column candidates and the target column of the target dataset. 10. The method of claim 9 , wherein finding, for each of the repsective target columns, the one or more source column candidates from the source dataset candidates comprises: comparing at least one of data types or column names of a plurality of columns of the source dataset candidates to data types or column names of the respective target columns of the target dataset. 11. The method of claim 1 , further comprising: providing a graphical user interface comprising a graph representing column lineage of the target column; and modifying the column lineage of the target column based on user input via the graphical user interface. 12. A system comprising: a memory; and a processing device, coupled to the memory, to: identify a logical query plan to derive a target dataset from one or more source datasets, wherein the logical query plan was generated from transformation code in a first programming language and comprises a hierarchical structure expressed as a tree of nodes of logical operators, wherein the logical query plan identifies a plurality of source columns of the one or more source datasets and respective target columns of the target dataset; parse the logical query plan to derive relationships between the plurality of source columns of the one or more source datasets and the respective target columns of the target dataset; generate target column metadata for a target column of the target dataset, the target column metadata reflecting a derived relationship derived from the logical query plan and between one or more source columns and the target column and reflecting existing source column metadata of each of the one or more source columns; and store the target column metadata for the target column of the target dataset. 13. The system of claim 12 , the processing device further to: responsive to determining that the logical query plan is not available for the transformation code in the first programming language, infer the relationships between the plurality of source columns of the one or more source datasets and the respective target columns of the target dataset. 14. The system of claim 12 , wherein to parse the logical query plan to derive the relationships between the plurality of source columns of the one or more source
to a system of files or objects, e.g. local or distributed file system or database · CPC title
Plan optimisation · CPC title
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.