Transforming and loading data utilizing in-memory processing

US2017075964A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017075964-A1
Application numberUS-201514851061-A
CountryUS
Kind codeA1
Filing dateSep 11, 2015
Priority dateSep 11, 2015
Publication dateMar 16, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.

First claim

Opening claim text (preview).

1 - 6 . (canceled) 7 . A system for processing an Extract, Transform, Load (ETL) job comprising: at least one processor configured to: analyze a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores; produce one or more data flows from the specification based on the one or more functional expressions, wherein the one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data; optimize the one or more data flows to assign operations to be performed on the one or more source data stores; and execute the optimized data flows to load the data to the one or more target data stores in accordance with the specification. 8 . The system of claim 7 , wherein the at least one processor is further configured to: store results of one or more designated operations on an in-memory distributed data set of a data flow. 9 . The system of claim 8 , wherein the at least one processor is further configured to: re-start the ETL job from a previously executed designated operation based on corresponding stored results. 10 . The system of claim 8 , wherein the at least one processor is further configured to: re-use the stored results of a designated operation in response to a subsequent execution of that operation. 11 . The system of claim 7 , wherein the at least one processor is further configured to: maintain a filtered status of data within an in-memory distributed data set to accommodate filtering conditions. 12 . The system of claim 7 , wherein executing the optimized data flows comprises: generating query language constructs for functions of the optimized data flows; generating a graph of objects of the in-memory distributed data sets corresponding to the query language constructs; and transforming the optimized data flows to the query language based on the generated graph. 13 . A computer program product for processing an Extract, Transform, Load (ETL) job, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: analyze a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores; produce one or more data flows from the specification based on the one or more functional expressions, wherein the one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data; optimize the one or more data flows to assign operations to be performed on the one or more source data stores; and execute the optimized data flows to load the data to the one or more target data stores in accordance with the specification. 14 . The computer program product of claim 13 , wherein the program instructions further cause the processor to: store results of one or more designated operations on an in-memory distributed data set of a data flow. 15 . The computer program product of claim 14 , wherein the program instructions further cause the processor to: re-start the ETL job from a previously executed designated operation based on corresponding stored results. 16 . The computer program product of claim 14 , wherein the program instructions further cause the processor to: re-use the stored results of a designated operation in response to a subsequent execution of that operation. 17 . The computer program product of claim 13 , wherein the program instructions further cause the processor to: maintain a filtered status of data within an in-memory distributed data set to accommodate filtering conditions. 18 . The computer program product of claim 13 , wherein executing the optimized data flows comprises: generating query language constructs for functions of the optimized data flows; generating a graph of objects of the in-memory distributed data sets corresponding to the query language constructs; and transforming the optimized data flows to the query language based on the generated graph.

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017075964A1 cover?
A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more function…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30563. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 16 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).