Presenting a selected table of data as a spreadsheet and transforming the data using a data flow graph
US-9727550-B2 · Aug 8, 2017 · US
US10915544B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10915544-B2 |
| Application number | US-201615227265-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 3, 2016 |
| Priority date | Sep 11, 2015 |
| Publication date | Feb 9, 2021 |
| Grant date | Feb 9, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.
Opening claim text (preview).
What is claimed is: 1. A method of processing an Extract, Transform, Load (ETL) job comprising: analyzing a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores; producing one or more data flows from the specification based on the one or more functional expressions, wherein the one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data, wherein producing the one or more data flows comprises transforming the ETL job into an in-memory computational model comprising a plurality of query language statements by: converting each source table identified in the ETL job into a read function executable on the in-memory distributed data sets, converting each target table identified in the ETL job into a write function executable on the in-memory distributed data sets, and converting each shaping or transformation operation identified in the ETL job into an operational statement executable on the in-memory distributed data sets; optimizing the one or more data flows to assign operations to be performed on the one or more source data stores, wherein optimizing the one or more data flows comprises consolidating two or more query language statements of the plurality of query language statements; and transmitting the in-memory computational model to a cluster comprising a plurality of nodes to execute, in parallel by the plurality of nodes, the optimized data flows to load the data to the one or more target data stores in accordance with the specification. 2. The method of claim 1 , further comprising: storing results of one or more designated operations on an in-memory distributed data set of a data flow. 3. The method of claim 2 , further comprising: re-starting the ETL job from a previously executed designated operation based on corresponding stored results. 4. The method of claim 2 , further comprising: re-using the stored results of a designated operation in response to a subsequent execution of that operation. 5. The method of claim 1 , further comprising: maintaining a filtered status of data within an in-memory distributed data set to accommodate filtering conditions. 6. The method of claim 1 , wherein executing the optimized data flows comprises: generating query language constructs for functions of the optimized data flows; generating a graph of objects of the in-memory distributed data sets corresponding to the query language constructs; and transforming the optimized data flows to the query language based on the generated graph.
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.