Extensible data transformation authoring and validation system

US12061884B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12061884-B2
Application numberUS-202318165780-A
CountryUS
Kind codeB2
Filing dateFeb 7, 2023
Priority dateDec 13, 2016
Publication dateAug 13, 2024
Grant dateAug 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method comprises obtaining a first build task for building first source code in a first programming language of a plurality of programming languages; retrieving, by the processor, the first source code based on the first build task; building the first source code into one or more artifacts and one or more job specifications; storing the one or more artifacts in a cache shared across a cluster; and initializing an application module on the cluster based on the first programming language, the application module configured to receive a job specification of the one or more job specifications and execute a data transformation job using a reference to a location in the cache.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: obtaining, by a processor, a first build task for building first source code in a first programming language of a plurality of programming languages; retrieving, by the processor, the first source code based on the first build task; building the first source code into one or more artifacts and one or more job specifications, a job specification of the one or more job specifications including instructions for how to construct a data transformation job transforming a first dataset into a second dataset using the one or more artifacts; storing the one or more artifacts in a cache shared across a cluster; receiving a request from a transform worker to launch an application module on the cluster, the request including the job specification or coordinates of an artifact of the one or more artifacts, initializing, in response to the receiving, an application module specific to the first programming language on the cluster, the application module configured to receive the job specification and execute the data transformation job using a reference to a location in the cache, and wherein the first build task includes a shrinkwrap library for inclusion in libraries or the job specification, the shrinkwrap library being a customized library used for secure obfuscation of sensitive data with a hashing function and functionality of the shrinkwrap library is included in the one or more artifacts or the one or more job specifications automatically. 2. The computer-implemented method of claim 1 , the transform worker and the application module supporting the first programming language but not another programming language, the transform worker storing a mapping between job types and application modules, the request specifying the application module based on the mapping. 3. The computer-implemented method of claim 1 , the request including one or more server settings of when to launch the application module, a location of where to launch the application module, how long the application module should be available, or security settings for the application module. 4. The computer-implemented method of claim 1 , the first build task specifying criteria for building the first source code, including when the first source code should be built, what libraries and virtual machine to use for building the first source code, configuration settings for building the first source code, or where output of building the first source code should be sent. 5. The computer-implemented method of claim 1 , further comprising: receiving a second application module for a second programming language different from the first programming language; updating a mapping between job types and application modules to refer to the second application module; obtaining a second build task for building second source code in the second programming language. 6. The computer-implemented method of claim 1 , the job specification including instructions detailing dataset dependencies for a data transformation job, instructions that indicate a job type that specifies a type of transform worker to run a data transformation job, or user-defined configuration settings for running a data transformation job. 7. The computer-implemented method of claim 1 , further comprising executing, multiple times, a specific data transformation job that requires a specific artifact of the one or more artifacts using a specific reference to a specific location in the cache. 8. The computer-implemented method of claim 1 , further comprising: receiving a plurality of job specifications requiring a common artifact of the one or more artifacts; executing, for each job specification of the plurality of job specifications, a corresponding data transformation job using a specific reference to the common artifact in the cache. 9. The computer-implemented method of claim 1 , further comprising: executing a specific data transformation job, including executing a specific artifact of the one or more artifacts, using a specific reference to the specific artifact in the cache; storing a result of the executing in a second cache on the cluster that is accessible to a second application module on the cluster different from the application module. 10. One or more non-transitory computer readable storage media storing one or more sequences of instructions which, when executed cause one or more processors to perform a method, the method comprising: obtaining, a first build task for building first source code in a first programming language of a plurality of programming languages; retrieving the first source code based on the first build task; building the first source code into one or more artifacts and one or more job specifications, a job specification of the one or more job specifications including instructions for how to construct a data transformation job transforming a first dataset into a second dataset using the one or more artifacts; storing the one or more artifacts in a cache shared across a cluster; receiving a request from a transform worker to launch an application module on the cluster, the request including the job specification or coordinates of an artifact of the one or more artifacts, initializing, in response to the receiving, an application module specific to the first programming language on the cluster, the application module configured to receive the job specification and execute the data transformation job using a reference to a location in the cache, and wherein the first build task includes a shrinkwrap library for inclusion in libraries or the job specification, the shrinkwrap library being a customized library used for secure obfuscation of sensitive data with a hashing function and functionality of the shrinkwrap library is included in the one or more artifacts or the one or more job specifications automatically. 11. The one or more non-transitory computer-readable storage media of claim 10 , the transform worker and the application module supporting the first programming language but not another programming language, the transform worker storing a mapping between job types and application modules, the request specifying the application module based on the mapping. 12. The one or more non-transitory computer-readable storage media of claim 10 , the request including one or more server settings of when to launch the application module, a location of where to launch the application module, how long the application module should be available, or security settings for the application module. 13. The one or more non-transitory computer-readable storage media of claim 10 , the first build task specifying criteria for building the first source code, including when the first source code should be built, what libraries and virtual machine to use for building the first source code, configuration settings for building the first source code, or where output of building the first source code should be sent. 14. The one or more non-transitory computer-readable storage media of claim 10 , the method further comprising: receiving a second application module for a second programming language different from the first programming language; updating a mapping between job types and application modules to refer to the second application module; obtaining a second build task for building second source code in the second programming language. 15. The one or more non-transitory computer-readable storage media of claim 10 , the job specification including instructions detailing dataset dependencies for

Assignees

Inventors

Classifications

  • Functional or applicative languages; Rewrite languages · CPC title

  • Version control (security arrangements therefor G06F21/57); Configuration management · CPC title

  • Authentication, i.e. establishing the identity or authorisation of security principals · CPC title

  • Procedural · CPC title

  • Object-oriented · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12061884B2 cover?
A computer-implemented method comprises obtaining a first build task for building first source code in a first programming language of a plurality of programming languages; retrieving, by the processor, the first source code based on the first build task; building the first source code into one or more artifacts and one or more job specifications; storing the one or more artifacts in a cache sh…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).