Converting a hybrid flow

US10102039B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10102039-B2
Application numberUS-201313896795-A
CountryUS
Kind codeB2
Filing dateMay 17, 2013
Priority dateMay 17, 2013
Publication dateOct 16, 2018
Grant dateOct 16, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Converting a hybrid flow can include combining each of a plurality of task nodes with a plurality of corresponding operators of the hybrid flow and converting the combined plurality of task nodes and the plurality of corresponding operators of the hybrid flow to a data flow graph using a code template.

First claim

Opening claim text (preview).

What is claimed: 1. A method for converting a hybrid flow comprising: identifying a plurality of task nodes of a job flow included in the hybrid flow; identifying a plurality of operators of a task flow included in the hybrid flow, wherein an operator among the plurality of operators is a composite operator; generating a single data flow by: separating the composite operator into a plurality of distinct operators; and combining each of the plurality of task nodes with the plurality of operators and the separated distinct operators; mapping the plurality of task nodes and the plurality of operators to a code template of a plurality of code templates, wherein each of the plurality of code templates is a different extract-transfer-load logical model template; converting the single data flow to a data flow graph using the mapped code template; inputting the data flow graph to an optimizer to optimize execution of the data flow graph, the optimizer globally optimizing the data flow graph in a pushdown manner to improve operator cohesion of the data flow graph; and executing the optimized data flow graph on a plurality of execution engines. 2. The method of claim 1 , wherein converting the single data flow to the data flow graph includes converting a plurality of graphs of the hybrid flow to a single data flow graph. 3. The method of claim 1 , wherein converting the single data flow to the data flow graph includes capturing structural information and flow metadata from the hybrid flow. 4. The method of claim 1 , wherein converting the single data flow to the data flow graph includes adding a connector operator between at least two operators among the plurality of operators or the separated distinct operators. 5. A non-transitory computer-readable medium storing instructions executable by a processing resource to: generate a single data flow by combining a plurality of task nodes from a job flow graph included in a hybrid flow, with a plurality of operators from a plurality of task flow graphs included in the hybrid flow, wherein an operator among the plurality of operators is a composite operator; separate the composite operator into a data source operator and a data computation operator; convert the single data flow to a data flow graph using a plurality of code templates, wherein each of the plurality of code templates is a different extract-transfer-load logical model template; optimize execution of the data flow graph by inputting the data flow graph to an optimizer, the optimizer globally optimizing the data flow graph in a pushdown manner to improve operator cohesion of the data flow graph; and execute the optimized data flow graph on a plurality of execution engines. 6. The non-transitory computer-readable medium of claim 5 , wherein the instructions to separate the composite operator include instructions executable to separate reading threads and extraction code from the composite operator. 7. The non-transitory computer-readable medium of claim 5 , wherein the instructions to separate the composite operator include instructions executable to separate loading code and writing threads from the composite operator. 8. The non-transitory computer-readable medium of claim 5 , including instructions executable to adjust at least one of an input schema and an output schema of the data computation operator based on a predecessor operator. 9. The non-transitory computer-readable medium of claim 5 , including instructions executable to convert the optimized data flow graph to a job flow format and a task flow format. 10. A system for converting a hybrid flow, the system comprising: a processing resource; a memory resource communicatively coupled to the processing resource containing instructions executable by the processing resource to: identify a plurality of task nodes of a job flow included in a hybrid flow; identify a plurality of operators of a task flow included in the hybrid flow, wherein an operator among the plurality of operators is a composite operator; flatten the job flow, by: combining the plurality of task nodes with the plurality of operators; map the flattened job flow to a plurality of code templates, wherein each of the plurality of code templates is a different extract-transfer-load logical model template; convert the flattened job flow to a data flow graph using the plurality of mapped code templates; input the data flow graph to an optimizer to optimize execution of the data flow graph, the optimizer globally optimizing the data flow graph in a pushdown manner to improve operator cohesion of the data flow graph; and convert the optimized data flow graph to a job flow format and a task flow format for execution on a plurality of execution engines. 11. The system of claim 10 , including instructions executable to add a pipeline connector operator between two operators among the plurality of operators. 12. The system of claim 10 , including instructions executable to add a data store connector operator that is accessible to a producer operator and a consumer operator among the plurality of operators. 13. The system of claim 10 , wherein the instructions executable by the processing resource to convert the optimized data flow graph to the job flow format and a task flow format include instructions executable to: merge the data computation operator and the data store operator back into the composite operator. 14. The system of claim 10 , wherein the instructions executable by the processing resource to convert the optimized data flow graph to the job flow format and the task flow format include instructions executable to: merge a separated fork operator into a producer operator among the plurality of operators. 15. The system of claim 10 , wherein the instructions executable by the processing resource to convert the optimized data flow graph to the job flow format and the task flow format include instructions executable to: merge a separated merger operator into a consumer operator among the plurality of operators. 16. The system of claim 10 , wherein the instructions executable by the processing resource to convert the optimized data flow graph to the job flow format and the task flow format include instructions executable to: remove the added connector operator from the converted data flow graph. 17. The system of claim 10 , wherein the instructions executable by the processing resource to convert the optimized data flow graph to the job flow format and the task flow format include instructions executable to: validate a predecessor operator of an n-ary operator. 18. The system of claim 10 , wherein the composite operator includes a data computation function and a data store function, further including instructions executable by the processing resource to: separate the data computation function and the data store function into a data computation operator and a data store operator in the converted data flow graph; and add a connector operator between operators associated with a connection between two tasks nodes among the plurality of task nodes in the converted data flow graph. 19. The method of claim 1 , wherein the optimizer pushes down optimization to execution engines corresponding to the operators, to globally optimize the data flow graph, leveraging optimization by the execution engines. 20. The method of claim 1 , wherein the operator cohesion for a given operator is defined as a sum of a cardinality of an input schemata of the given operator and an output

Assignees

Inventors

Classifications

  • G06F9/5066Primary

    Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10102039B2 cover?
Converting a hybrid flow can include combining each of a plurality of task nodes with a plurality of corresponding operators of the hybrid flow and converting the combined plurality of task nodes and the plurality of corresponding operators of the hybrid flow to a data flow graph using a code template.
Who is the assignee on this patent?
Entit Software Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/5066. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 16 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).