Generating data transformation workflows
US-2018150528-A1 · May 31, 2018 · US
US12061884B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12061884-B2 |
| Application number | US-202318165780-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 7, 2023 |
| Priority date | Dec 13, 2016 |
| Publication date | Aug 13, 2024 |
| Grant date | Aug 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method comprises obtaining a first build task for building first source code in a first programming language of a plurality of programming languages; retrieving, by the processor, the first source code based on the first build task; building the first source code into one or more artifacts and one or more job specifications; storing the one or more artifacts in a cache shared across a cluster; and initializing an application module on the cluster based on the first programming language, the application module configured to receive a job specification of the one or more job specifications and execute a data transformation job using a reference to a location in the cache.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: obtaining, by a processor, a first build task for building first source code in a first programming language of a plurality of programming languages; retrieving, by the processor, the first source code based on the first build task; building the first source code into one or more artifacts and one or more job specifications, a job specification of the one or more job specifications including instructions for how to construct a data transformation job transforming a first dataset into a second dataset using the one or more artifacts; storing the one or more artifacts in a cache shared across a cluster; receiving a request from a transform worker to launch an application module on the cluster, the request including the job specification or coordinates of an artifact of the one or more artifacts, initializing, in response to the receiving, an application module specific to the first programming language on the cluster, the application module configured to receive the job specification and execute the data transformation job using a reference to a location in the cache, and wherein the first build task includes a shrinkwrap library for inclusion in libraries or the job specification, the shrinkwrap library being a customized library used for secure obfuscation of sensitive data with a hashing function and functionality of the shrinkwrap library is included in the one or more artifacts or the one or more job specifications automatically. 2. The computer-implemented method of claim 1 , the transform worker and the application module supporting the first programming language but not another programming language, the transform worker storing a mapping between job types and application modules, the request specifying the application module based on the mapping. 3. The computer-implemented method of claim 1 , the request including one or more server settings of when to launch the application module, a location of where to launch the application module, how long the application module should be available, or security settings for the application module. 4. The computer-implemented method of claim 1 , the first build task specifying criteria for building the first source code, including when the first source code should be built, what libraries and virtual machine to use for building the first source code, configuration settings for building the first source code, or where output of building the first source code should be sent. 5. The computer-implemented method of claim 1 , further comprising: receiving a second application module for a second programming language different from the first programming language; updating a mapping between job types and application modules to refer to the second application module; obtaining a second build task for building second source code in the second programming language. 6. The computer-implemented method of claim 1 , the job specification including instructions detailing dataset dependencies for a data transformation job, instructions that indicate a job type that specifies a type of transform worker to run a data transformation job, or user-defined configuration settings for running a data transformation job. 7. The computer-implemented method of claim 1 , further comprising executing, multiple times, a specific data transformation job that requires a specific artifact of the one or more artifacts using a specific reference to a specific location in the cache. 8. The computer-implemented method of claim 1 , further comprising: receiving a plurality of job specifications requiring a common artifact of the one or more artifacts; executing, for each job specification of the plurality of job specifications, a corresponding data transformation job using a specific reference to the common artifact in the cache. 9. The computer-implemented method of claim 1 , further comprising: executing a specific data transformation job, including executing a specific artifact of the one or more artifacts, using a specific reference to the specific artifact in the cache; storing a result of the executing in a second cache on the cluster that is accessible to a second application module on the cluster different from the application module. 10. One or more non-transitory computer readable storage media storing one or more sequences of instructions which, when executed cause one or more processors to perform a method, the method comprising: obtaining, a first build task for building first source code in a first programming language of a plurality of programming languages; retrieving the first source code based on the first build task; building the first source code into one or more artifacts and one or more job specifications, a job specification of the one or more job specifications including instructions for how to construct a data transformation job transforming a first dataset into a second dataset using the one or more artifacts; storing the one or more artifacts in a cache shared across a cluster; receiving a request from a transform worker to launch an application module on the cluster, the request including the job specification or coordinates of an artifact of the one or more artifacts, initializing, in response to the receiving, an application module specific to the first programming language on the cluster, the application module configured to receive the job specification and execute the data transformation job using a reference to a location in the cache, and wherein the first build task includes a shrinkwrap library for inclusion in libraries or the job specification, the shrinkwrap library being a customized library used for secure obfuscation of sensitive data with a hashing function and functionality of the shrinkwrap library is included in the one or more artifacts or the one or more job specifications automatically. 11. The one or more non-transitory computer-readable storage media of claim 10 , the transform worker and the application module supporting the first programming language but not another programming language, the transform worker storing a mapping between job types and application modules, the request specifying the application module based on the mapping. 12. The one or more non-transitory computer-readable storage media of claim 10 , the request including one or more server settings of when to launch the application module, a location of where to launch the application module, how long the application module should be available, or security settings for the application module. 13. The one or more non-transitory computer-readable storage media of claim 10 , the first build task specifying criteria for building the first source code, including when the first source code should be built, what libraries and virtual machine to use for building the first source code, configuration settings for building the first source code, or where output of building the first source code should be sent. 14. The one or more non-transitory computer-readable storage media of claim 10 , the method further comprising: receiving a second application module for a second programming language different from the first programming language; updating a mapping between job types and application modules to refer to the second application module; obtaining a second build task for building second source code in the second programming language. 15. The one or more non-transitory computer-readable storage media of claim 10 , the job specification including instructions detailing dataset dependencies for
Functional or applicative languages; Rewrite languages · CPC title
Version control (security arrangements therefor G06F21/57); Configuration management · CPC title
Authentication, i.e. establishing the identity or authorisation of security principals · CPC title
Procedural · CPC title
Object-oriented · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.