Cataloging data sets for reuse in pipeline applications
US-9495207-B1 · Nov 15, 2016 · US
US10949219B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10949219-B2 |
| Application number | US-201816010003-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 15, 2018 |
| Priority date | Jun 15, 2018 |
| Publication date | Mar 16, 2021 |
| Grant date | Mar 16, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for executing a data processing pipeline may be provided. The method may include identifying a file providing a runtime environment required for executing a series of data processing operations comprising the data processing pipeline. The file may be identified based on one or more tags associated with the data processing pipeline. The one or more tags may specify at least one runtime requirement for the series of data processing operations. The file may be executed to generate an executable package that includes a plurality of components required for executing the series of data processing operations. The series of data processing operations included in the data processing pipeline may be executed by at least executing the executable package to provide the runtime environment required for executing the series of data processing operations. Related systems and articles of manufacture, including computer program products, are also provided.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: at least one data processor; and at least one memory storing instructions which, when executed by the at least one data processor, result in operations comprising: obtaining one or more first tags, each of the one or more first tags specifying a first plurality of components of a runtime environment provided by a first file; obtaining one or more second tags, each of the one or more second tags specifying a second plurality of components required by each of a series of data processing operations, the series of data processing operations comprising a data processing pipeline; obtaining one or more third tags specifying a third plurality of components of a runtime environment provided by a second file; selecting the first file by at least matching the one or more first tags with the one or more second tags, the first file being selected instead of the second file in response to determining that a count of the one or more first tags is less than a count of the one or more third tags; executing the first file to generate an executable package that includes the first plurality of components required for executing the series of data processing operations; and executing the series of data processing operations included in the data processing pipeline by at least executing the executable package to provide the runtime environment required for executing the series of data processing operations, the series of data processing operations included in the data processing pipeline being executed to manipulate data stored in an in-memory database. 2. The system of claim 1 , wherein the first file comprises a script, and wherein the script includes a sequence of instructions for generating the executable package. 3. The system of claim 1 , wherein the second plurality of components required for executing the series of data processing operations includes programming code, runtime, libraries, environment variables, and/or configuration files. 4. The system of claim 1 , wherein at least one tag of the one or more second tags is associated with multiple data processing operations comprising the series of data processing operations, and wherein the multiple data processing operations share at least one common runtime requirement. 5. The system of claim 1 , wherein the series of data processing operations is executed based at least on a graph representative of the data processing pipeline. 6. The system of claim 5 , wherein the graph includes a plurality of nodes interconnected by one or more edges, wherein each of the plurality of nodes correspond to one data processing operation from the series of data processing operations, and wherein the one or more edges indicate a flow of data between different data processing operations. 7. The system of claim 1 , further comprising: receiving, from a client, an input indicating a runtime requirement of one or more data processing operations from the series of data processing operations, the input comprising an annotation of programming code implementing the one or more data processing operations; and in response to the input, associating the one or more data processing operations with a tag corresponding to the runtime requirement. 8. A computer-implemented method, comprising: obtaining one or more first tags, each of the one or more first tags specifying a first plurality of components of a runtime environment provided by a first file; obtaining one or more second tags, each of the one or more second tags specifying a second plurality of components required by each of a series of data processing operations, the series of data processing operations comprising a data processing pipeline; obtaining one or more third tags specifying a third plurality of components of a runtime environment provided by a second file; selecting the first file by at least matching the one or more first tags with the one or more second tags, the first file being selected instead of the second file in response to determining that a count of the one or more first tags is less than a count of the one or more third tags; executing the first file to generate an executable package that includes the first plurality of components required for executing the series of data processing operations; and executing the series of data processing operations included in the data processing pipeline by at least executing the executable package to provide the runtime environment required for executing the series of data processing operations, the series of data processing operations included in the data processing pipeline being executed to manipulate data stored in an in-memory database. 9. The method of claim 8 , wherein the first file comprises a script, and wherein the script includes a sequence of instructions for generating the executable package. 10. The method of claim 8 , wherein the second plurality of components required for executing the series of data processing operations includes programming code, runtime, libraries, environment variables, and/or configuration files. 11. The method of claim 8 , wherein at least one tag of the one or more second tags is associated with multiple data processing operations comprising the series of data processing operations, and wherein the multiple data processing operations share at least one common runtime requirement. 12. The method of claim 8 , wherein the series of data processing operations is executed based at least on a graph representative of the data processing pipeline. 13. The method of claim 12 , wherein the graph includes a plurality of nodes interconnected by one or more edges, wherein each of the plurality of nodes correspond to one data processing operation from the series of data processing operations, and wherein the one or more edges indicate a flow of data between different data processing operations. 14. A non-transitory computer-readable medium storing instructions, which when executed by at least one data processor, result in operations comprising: obtaining one or more first tags, each of the one or more first tags specifying a first plurality of components of a runtime environment provided by a first file; obtaining one or more second tags, each of the one or more second tags specifying a second plurality of components required by each of a series of data processing operations, the series of data processing operations comprising a data processing pipeline; obtaining one or more third tags specifying a third plurality of components of a runtime environment provided by a second file; selecting the first file by at least matching the one or more first tags with the one or more second tags, the first file being selected instead of the second file in response to determining that a count of the one or more first tags is less than a count of the one or more third tags; executing the first file to generate an executable package that includes the first plurality of components required for executing the series of data processing operations; and executing the series of data processing operations included in the data processing pipeline by at least executing the executable package to provide the runtime environment required for executing the series of data processing operations, the series of data processing operations included in the data processing pipeline being executed to manipulate data stored in an in-memory database.
Configuring for program initiating, e.g. using registry, configuration files · CPC title
File access structures, e.g. distributed indices (arrangements of input from, or output to, record carriers G06F3/06) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.