Methods and apparatus for analytical processing of provenance data for HPC workflow optimization

US10013656B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10013656-B1
Application numberUS-201414580732-A
CountryUS
Kind codeB1
Filing dateDec 23, 2014
Priority dateDec 23, 2014
Publication dateJul 3, 2018
Grant dateJul 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus are provided for analytical processing of provenance data for High Performance Computing workflow optimization. Prediction models for a workflow composed of a plurality of activities are created by (i) generating a plurality of prediction functions from input features and output features of the workflow, wherein each of the prediction functions predicts at least one output feature of at least one of activities of the workflow based on the input features of at least one activity; and (ii) combining the plurality of prediction functions to generate the prediction models, wherein each of the prediction models predicts a final output feature of the workflow based on an input of the workflow for a given execution plan of the workflow. A plurality of the prediction models can be evaluated to select, among the possible execution plans, an instantiation of the workflow for a given input that optimizes a given user goal.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating one or more prediction models for a workflow composed of a plurality of activities, comprising: extracting one or more input features from input data from a plurality of previous executions of said plurality of activities and extracting one or more output features from output data from said plurality of previous executions of said plurality of activities, wherein said plurality of activities execute in one or more computing devices; automatically learning, using at least one processing device, a plurality of prediction functions from one or more input features and one or more output features of said workflow, wherein each of said prediction functions predicts at least one of said output features of at least one of said plurality of activities of said workflow based on one or more of said input features of said at least one activity of said workflow; selecting, using said at least one processing device, one of said plurality of prediction functions for each of said plurality of activities in said workflow based on a particular goal and a succession of said plurality of activities according to a definition of said workflow to generate a selected subset of prediction functions; combining, using said at least one processing device, said selected subset of said plurality of prediction functions to generate said one or more prediction models based on the succession of said plurality of activities according to the definition of said workflow, wherein each of said one or more prediction models predicts a final output feature of said workflow based on one or more of said input features extracted from one or more initial inputs of said workflow; and selecting an instantiation of said workflow for a given input and said particular goal by evaluating a plurality of said one or more prediction models. 2. The method of claim 1 , wherein said one or more input features and said one or more output features are extracted from one or more of input data, output data, execution data and provenance data of said workflow. 3. The method of claim 2 , wherein said one or more input features and said one or more output features comprise features from within one or more files referenced by said provenance data. 4. The method of claim 1 , wherein one or more of said input features and said output features are extracted using one or more file format cartridges. 5. The method of claim 1 , wherein said plurality of activities of said workflow are specified by a user, and wherein said user further specifies one or more data dependencies between said plurality of activities and an association of one or more input features of at least one activity with one or more output features of at least one prior activity. 6. The method of claim 1 , wherein each of said one or more prediction models predicts a final output feature of said workflow for a given execution plan of said workflow. 7. The method of claim 1 , wherein said one or more output features of a given activity are propagated through said workflow as one or more input features of one or more activities following said given activity of said workflow. 8. The method of claim 1 , wherein said instantiation of said workflow is selected by evaluating a prediction model representing said particular goal. 9. The method of claim 1 , wherein said instantiation of said workflow is selected for said given input and said particular goal subject to one or more additional constraints. 10. The method of claim 1 , wherein at least one parameter of one or more of a program and of one or more execution engines related to said workflow is unspecified, and wherein a value for said at least one unspecified parameter is selected to instantiate the workflow for said given input and said particular goal. 11. The method of claim 1 , further comprising the step of repeating said steps of automatically learning a plurality of prediction functions and combining said plurality of prediction functions to regenerate said one or more prediction models based on new provenance data. 12. A computer program product comprising a non-transitory machine-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed perform the following steps: extracting one or more input features from input data from a plurality of previous executions of said plurality of activities and extracting one or more output features from output data from said plurality of previous executions of said plurality of activities, wherein said plurality of activities execute in one or more computing devices; automatically learning, using at least one processing device, a plurality of prediction functions from one or more input features and one or more output features of said workflow, wherein each of said prediction functions predicts at least one of said output features of at least one of said plurality of activities of said workflow based on one or more of said input features of said at least one activity of said workflow; selecting, using said at least one processing device, one of said plurality of prediction functions for each of said plurality of activities in said workflow based on a particular goal and a succession of said plurality of activities according to a definition of said workflow to generate a selected subset of prediction functions; combining, using said at least one processing device, said selected subset of said plurality of prediction functions to generate said one or more prediction models based on the succession of said plurality of activities according to the definition of said workflow, wherein each of said one or more prediction models predicts a final output feature of said workflow based on one or more of said input features extracted from one or more initial inputs of said workflow; and selecting an instantiation of said workflow for a given input and said particular goal by evaluating a plurality of said one or more prediction models. 13. A system for generating one or more prediction models for a workflow comprised of a plurality of activities, comprising: a memory; and at least one hardware device, coupled to the memory, operative to implement the following steps: extracting one or more input features from input data from a plurality of previous executions of said plurality of activities and extracting one or more output features from output data from said plurality of previous executions of said plurality of activities, wherein said plurality of activities execute in one or more computing devices; automatically learning, using at least one processing device, a plurality of prediction functions from one or more input features and one or more output features of said workflow, wherein each of said prediction functions predicts at least one of said output features of at least one of said plurality of activities of said workflow based on one or more of said input features of said at least one activity of said workflow; selecting, using said at least one processing device, one of said plurality of prediction functions for each of said plurality of activities in said workflow based on a particular goal and a succession of said plurality of activities according to a definition of said workflow to generate a selected subset of prediction functions; combining, using said at least one processing device, said selected subset of said plurality of prediction functions to generate said one or more prediction models based on the succession of said plurality of activities according to the definition of said workflow, wherein each of said one or more predi

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Help systems · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title

  • G06N5/04Primary

    Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10013656B1 cover?
Methods and apparatus are provided for analytical processing of provenance data for High Performance Computing workflow optimization. Prediction models for a workflow composed of a plurality of activities are created by (i) generating a plurality of prediction functions from input features and output features of the workflow, wherein each of the prediction functions predicts at least one output…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).