Generating code for an integrated data system

US9727604B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9727604-B2
Application numberUS-37254006-A
CountryUS
Kind codeB2
Filing dateMar 10, 2006
Priority dateMar 10, 2006
Publication dateAug 8, 2017
Grant dateAug 8, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer implemented method for generating code for an integrated data system. A mixed data flow is received. The mixed data flow contains mixed data flow operators, which are associated with multiple runtime environments. A graph is generated containing logical operators based on the mixed data flow in response to receiving the mixed data flow. The logical operators are independent of the plurality of runtime environments. The graph is converted to a model. The logical operators are converted to model operators associated with the multiple runtime environments. The model operators allow for analysis of operations for the mixed data flow. The model is converted into an execution plan graph. The execution plan graph is executable on different runtime environments.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for generating code for an integrated data system, the computer-implemented method comprising: receiving a mixed data flow containing mixed data flow operators, the mixed data flow operators collectively defining operations to be performed to complete the mixed data flow, wherein a first of a plurality of runtime environments of distinct types is selected to perform a given one of the defined operations, wherein the given operation is dependent on at least one other operation performed in a second of the plurality of runtime environments; generating, based on the mixed data flow, a logical operator graph containing logical operators independent of the plurality of runtime environments; converting the logical operator graph to an extended query graph model in which the logical operators are converted to model operators associated with the plurality of runtime environments; analyzing the extended query graph model in order to pre-optimize code generation to include at least one of chunking and execution parallelism; subsequent to analyzing the extended query graph model, converting the extended query graph model via code generation by operation of one or more computer processors into an execution plan graph executable on the plurality of different types of runtime environments; and executing the execution plan graph by an execution engine that invokes a plurality of runtime engines, each runtime engine being of a distinct runtime engine type corresponding to a respective one of the plurality of runtime environments. 2. The computer implemented method of claim 1 , wherein the plurality of runtime engines includes any of an extract, transform, load engine, a DataStage engine, and structured query language engine. 3. The computer implemented method of claim 1 , wherein the graph operators are logical operator graph operators, wherein the model operators are extended query graph model operators, wherein converting the logical operator graph to the extended query graph model comprises: mapping the logical operator graph operators to the extended query graph model operators; and transforming relationships between respective operations of the logical operator graph operators to relationships between respective operations of the extended query graph model operators. 4. The computer implemented method of claim 1 , wherein converting the logical operator graph to the extended query graph model comprises: converting a logical operator graph operation directly to an extended query graph model quantifier. 5. The computer implemented method of claim 1 , wherein converting the logical operator graph to the extended query graph model comprises: mapping properties of a logical operator graph operation to properties of an extended query graph model entity. 6. The computer implemented method of claim 1 , wherein converting the logical operator graph to the extended query graph model comprises: mapping a logical operator graph operation to a property of an extended query graph model operator. 7. The computer implemented method of claim 1 , wherein converting the logical operator graph to the extended query graph model comprises: transforming a logical operator graph operation to any of a set of table functions, and stored procedures to invoke an executable program. 8. The computer implemented method of claim 1 , wherein converting the logical operator graph to the extended query graph model comprises: converting an expression in the logical operator graph to an expression tree in the extended query graph model. 9. The computer implemented method of claim 1 , comprising: performing analysis and optimization of the logical operator graph, the extended query graph model, and the execution plan graph, respectively. 10. The computer implemented method of claim 1 , wherein the extended query graph model includes structured query language operations, executable operations, and custom operations. 11. The computer implemented method of claim 1 , wherein the logical operator graph is a metadata representation of the mixed data flow. 12. The computer implemented method of claim 1 , where the computer-implemented method is to generate the execution plan graph from the mixed data flow and for execution on the plurality of different types of runtime environments programmatically selected as satisfying a set of predefined criteria; wherein the extended query graph model in analyzed in order to pre-optimize code generation to include both chunking and execution parallelism; wherein chunking comprises breaking one subset of the mixed data flow into multiple units in order to improve execution efficiency, wherein execution parallelism comprises grouping disparate sets of operations within the mixed data flow and executing the disparate sets in parallel in order to further improve execution efficiency; wherein the model operators allow for analysis of operations for the mixed data flow, wherein the mixed data flow is received from a user, wherein the operation is selected based on user input, wherein a processing application is programmatically selected, wherein the processing application and first runtime environment are not selected based on any user input; wherein the plurality of different types of runtime environments are programmatically selected as satisfying the set of predefined criteria; wherein the processing application is programmatically determined to satisfy a predefined suitability condition, wherein the suitability condition is satisfied upon identifying a matching runtime environment; wherein the mixed data flow consists of a plurality of data flows specified in a single request from a user, wherein each data flow in a plurality of data flows is of a distinct data type; wherein the computer-implemented method further comprises: outputting an indication that the execution plan graph was executed. 13. The computer implemented method of claim 12 , wherein the operations to be performed to complete the mixed data flow as defined by the mixed data flow operators include: (i) a first predefined operation comprising extracting data from one or more files; (ii) a second predefined operation comprising extracting data from one or more tables; (iii) a third predefined operation comprising filtering said data; (iv) a fourth predefined operation comprising joining data extracted from one or more tables with data extracted from one or more files; (v) a fifth predefined operation comprising removing duplicate data; (vi) a sixth predefined operation comprising saving data in a file; and (vii) a seventh predefined operation comprising loading data onto a target. 14. The computer implemented method of claim 13 , wherein converting the logical operator graph to the extended query graph model comprises: (i) converting a logical operator graph operation directly to an extended query graph model quantifier; (ii) mapping properties of a logical operator graph operation to properties of an extended query graph model entity; (iii) mapping a logical operator graph operation to a property of an extended query graph model operator; (iv) transforming a logical operator graph operation to any of a set of table functions, and stored procedures to invoke an executable program; and (v) converting an expression in the logical operator graph to an expression tree in the extended query graph model. 15. The computer implemented method of claim 14 , wherein the graph operators are logical operator graph operators, and wherein the model operators are extended query graph model

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Physics · mapped topic

  • Data format conversion from or to a database · CPC title

  • Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9727604B2 cover?
A computer implemented method for generating code for an integrated data system. A mixed data flow is received. The mixed data flow contains mixed data flow operators, which are associated with multiple runtime environments. A graph is generated containing logical operators based on the mixed data flow in response to receiving the mixed data flow. The logical operators are independent of the pl…
Who is the assignee on this patent?
Jin Qi, Liao Hui, Padmanabhan Sriram K, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F17/30424. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 08 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).