Selective checkpointing of links in a data flow based on a set of predefined criteria

US9256460B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9256460-B2
Application numberUS-201313843425-A
CountryUS
Kind codeB2
Filing dateMar 15, 2013
Priority dateMar 15, 2013
Publication dateFeb 9, 2016
Grant dateFeb 9, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for qualified checkpointing of a data flow model having data flow operators and links connecting the data flow operators. A link of the data flow model is selected based on a set of checkpoint criteria. A checkpoint is generated for the selected link. The checkpoint is selected from different checkpoint types. The generated checkpoint is assigned to the selected link. The data flow model, having at least one link with no assigned checkpoint, is executed.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product for qualified checkpointing of a data flow model having a plurality of data flow operators and a plurality of links connecting the data flow operators, the computer program product comprising: a non-transitory computer-readable medium having program code embodied therewith, the program code executable by one or more computer processors to: select a link of the plurality of links of the data flow model, that satisfies one or more checkpoint criteria from a predefined set of checkpoint criteria; generate a checkpoint for the selected link, wherein the checkpoint is selected from a retargetable checkpoint, a connection checkpoint, a parallel checkpoint, a bottleneck checkpoint, and a recovery checkpoint, wherein at least four of the retargetable checkpoint, the connection checkpoint, the parallel checkpoint, the bottleneck checkpoint, and the recovery checkpoint are generable by the program code, wherein the generated checkpoint is assigned to the selected link; and execute the data flow model, wherein at least one link of the plurality of links of the data flow model has no assigned checkpoint. 2. The computer program product of claim 1 , wherein the predefined set of checkpoint criteria specifies to: generate the retargetable checkpoint between two sub-flows of different processing types; and generate the connection checkpoint between two sub-flows having different design focus properties. 3. The computer program product of claim 2 , wherein the predefined set of checkpoint criteria further specifies to: generate the parallel checkpoint between two sub-flows having a measure of pipeline parallelism beyond a predefined threshold; generate the bottleneck checkpoint between an upstream sub-flow and a data flow operator identified as having a potential bottleneck; and generate the recovery checkpoint between an upstream sub-flow and a data flow operator having a highest measure of likelihood of failing among the plurality of data flow operators of the data flow model. 4. The computer program product of claim 3 , wherein the data flow model is executable across different runtime engine types, wherein a first sub-flow of the data flow is executable on a retargetable engine type, and wherein multiple sub-flows of the data flow are executable in parallel. 5. The computer program product of claim 4 , whereby the data flow supports both performance enhancement and failure recovery, without requiring full checkpointing of the data flow, wherein full checkpointing of the data flow comprises assigning a respective checkpoint to each link of the data flow. 6. The computer program product of claim 5 , wherein the program code is of an application, wherein the application includes a request handler component, an engine selector component, an engine manager component, a score composer component, and an execution manager component. 7. The computer program product of claim 1 , wherein generating the checkpoint for the selected link comprises: generating the retargetable checkpoint between two sub-flows of different processing types. 8. The computer program product of claim 1 , wherein generating the checkpoint for the selected link comprises: generating the connection checkpoint between two sub-flows having different design focus properties. 9. The computer program product of claim 1 wherein generating the checkpoint for the selected link comprises: generating the parallel checkpoint between two sub-flows having a measure of pipeline parallelism beyond a predefined threshold. 10. The computer program product of claim 1 , wherein generating the checkpoint for the selected link comprises: generating the bottleneck checkpoint between an upstream sub-flow and a data flow operator identified as having a potential bottleneck. 11. A system for qualified checkpointing of a data flow model having a plurality of data flow operators and a plurality of links connecting the data flow operators, the system comprising: one or more computer processors; a memory containing a program which, when executed by the one or more computer processors, is configured to perform an operation comprising: selecting a link of the plurality of links of the data flow model, that satisfies one or more checkpoint criteria from a predefined set of checkpoint criteria; generating a checkpoint for the selected link, wherein the checkpoint is selected from a retargetable checkpoint, a connection checkpoint, a parallel checkpoint, a bottleneck checkpoint, and a recovery checkpoint, wherein at least four of the retargetable checkpoint, the connection checkpoint, the parallel checkpoint, the bottleneck checkpoint, and the recovery checkpoint are generable by the program, wherein the generated checkpoint is assigned to the selected link; and executing the data flow model, wherein at least one link of the plurality of links of the data flow model has no assigned checkpoint. 12. The system of claim 11 , wherein the predefined set of checkpoint criteria specifies to: generate the retargetable checkpoint between two sub-flows of different processing types; and generate the connection checkpoint between two sub-flows having different design focus properties. 13. The system of claim 12 , wherein the predefined set of checkpoint criteria further specifies to: generate the parallel checkpoint between two sub-flows having a measure of pipeline parallelism beyond a predefined threshold; generate the bottleneck checkpoint between an upstream sub-flow and a data flow operator identified as having a potential bottleneck; and generate the recovery checkpoint between an upstream sub-flow and a data flow operator having a highest measure of likelihood of failing among the plurality of data flow operators of the data flow model. 14. The system of claim 13 , wherein the data flow model is executable across different runtime engine types, wherein a first sub-flow of the data flow is executable on a retargetable engine type, and wherein multiple sub-flows of the data flow are executable in parallel. 15. The system of claim 14 , whereby the data flow supports both performance enhancement and failure recovery, without requiring full checkpointing of the data flow, wherein full checkpointing of the data flow comprises assigning a respective checkpoint to each link of the data flow; wherein the program includes a request handler component, an engine selector component, an engine manager component, a score composer component, and an execution manager component. 16. The system of claim 11 , wherein generating the checkpoint for the selected link comprises: generating the retargetable checkpoint between two sub-flows of different processing types. 17. The system of claim 11 , wherein generating the checkpoint for the selected link comprises: generating the connection checkpoint between two sub-flows having different design focus properties. 18. The system of claim 11 , wherein generating the checkpoint for the selected link comprises: generating the parallel checkpoint between two sub-flows having a measure of pipeline parallelism beyond a predefined threshold. 19. The system of claim 11 , wherein generating the checkpoint for the selected link comprises: generating the bottleneck checkpoint between an upstream sub-flow and a data flow operator identified as having a potential bottleneck. 20. The system of claim 11 , wherein generating the checkpoint for the selected link comprises: generating the recovery checkpo

Assignees

Inventors

Classifications

  • G06F9/5066Primary

    Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title

  • Point-in-time backing up or restoration of persistent data · CPC title

  • G06F9/461Primary

    Saving or restoring of program or task context · CPC title

  • Error detection or correction of the data by redundancy in operations (error detection or correction of the data by redundancy in hardware G06F11/16) · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9256460B2 cover?
Techniques are disclosed for qualified checkpointing of a data flow model having data flow operators and links connecting the data flow operators. A link of the data flow model is selected based on a set of checkpoint criteria. A checkpoint is generated for the selected link. The checkpoint is selected from different checkpoint types. The generated checkpoint is assigned to the selected link. T…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F9/5066. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 09 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).