Extensible data transformation authoring and validation system

US10860299B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10860299-B2
Application numberUS-201916384691-A
CountryUS
Kind codeB2
Filing dateApr 15, 2019
Priority dateDec 13, 2016
Publication dateDec 8, 2020
Grant dateDec 8, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data transformation in a distributed system of applications and data repositories is described. The subsystems for the overall framework are distributed, thereby allowing for customization to require only isolated changes to one or more subsystems. In one embodiment, a source code repository is used to receive and store source code. A build subsystem can retrieve source code from the source code repository and build it, using one or more criteria. By building the source code, the build subsystem can generate an artifact, which is executable code, such as a JAR or SQL file. Likewise, by building the source code, the build subsystem can generate one or more job specifications for executing the executable code. In one embodiment, the artifact and job specification may be used to launch an application server in a cluster. The application server can then receive data transformation instructions and execute the data transformation instructions.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving user code from a computer data storage device or input from a computer input device, the user code comprising a function that comprises a sequence of computation instructions for a dataset transformation that generates an output dataset, and a decorator that corresponds to the function that identifies one or more dependent datasets that are necessary to execute the function and one or more expected input datasets for a lower-order function that defines a data transformation job; digitally storing the user code at a first code repository; building the user code into executable code comprising one or more machine executable computer program files; receiving a data transformation command that identifies the function; in response to receiving the data transformation command: based on the one or more expected input datasets for the lower-order function, using the executable code to invoke a higher-order function that corresponds to the function to identify and retrieve the one or more dependent datasets; executing the function, using the one or more dependent datasets and the lower-order function, to generate a particular output dataset. 2. The method of claim 1 , further comprising, in response to receiving the data transformation command, using the decorator that corresponds to the function to generate a dataset dependency file that identifies the one or more dependent datasets. 3. The method of claim 1 , wherein the decorator further identifies one or more dataset types for the one or more dependent datasets that are necessary to execute the function. 4. The method of claim 1 , wherein the user code is written in a Python programming language and comprises at least a portion of a computer program. 5. The method of claim 4 , wherein the user code uses one or more of Python libraries comprising Pandas, NumPy, SciPy, or IPython Notebook. 6. The method of claim 1 , further comprising specifying, using the decorator, metaprogramming dependencies of the one or more dependent datasets that are necessary to execute the function. 7. The method of claim 1 , wherein the decorator further defines an expected location of input datasets, expected content of the input datasets or an expected characteristic of the input datasets, and wherein the one or more dependent datasets comprise the input datasets. 8. One or more non-transitory computer readable storage media storing one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform: receiving user code from a computer data storage device or input from a computer input device, the user code comprising a function that comprises a sequence of computation instructions for a dataset transformation that generates an output dataset and a decorator that corresponds to the function that identifies one or more dependent datasets that are necessary to execute the function and one or more expected input datasets for a lower-order function that defines a data transformation job; digitally storing the user code at a first code repository; building the user code into executable code comprising one or more machine executable computer program files; receiving a data transformation command that identifies the function; in response to receiving the data transformation command: based on the one or more expected input datasets for the lower-order function, using the executable code to invoke a higher-order function that corresponds to the function to identify and retrieve the one or more dependent datasets; executing the function, using the one or more dependent datasets and the lower-order function, to generate a particular output dataset. 9. The one or more non-transitory computer readable media of claim 8 , further comprising sequences of instructions which, when executed by the one or more processors, cause the one or more processors to perform: in response to receiving the data transformation command, using the decorator that corresponds to the function to generate a dataset dependency file that identifies the one or more dependent datasets. 10. The one or more non-transitory computer readable media of claim 8 , wherein the decorator further identifies one or more dataset types for the one or more dependent datasets that are necessary to execute the function. 11. The one or more non-transitory computer readable media of claim 8 , wherein the user code is written in a Python programming language and comprises at least a portion of a computer program. 12. The one or more non-transitory computer readable media of claim 11 , wherein the user code uses one or more of Python libraries comprising Pandas, NumPy, SciPy, or IPython Notebook. 13. The one or more non-transitory computer readable media of claim 8 , wherein the decorator specifies metaprogramming dependencies of the one or more dependent datasets that are necessary to execute the function. 14. The one or more non-transitory computer readable media of claim 8 , wherein the decorator further defines an expected location of input datasets, an expected type of the input datasets, expected content of the input datasets or an expected characteristic of the input datasets, and wherein the one or more dependent datasets comprise the input datasets. 15. A computer system comprising: a processor; and a memory coupled to the processor and storing one or more sequences of instructions which, when executed by the processor, cause the processor to perform: receiving user code from a computer data storage device or input from a computer input device, the user code comprising a function that comprises a sequence of computation instructions for a dataset transformation that generates an output dataset and a decorator that corresponds to the function that identifies one or more dependent datasets that are necessary to execute the function and one or more expected input datasets for a lower-order function that defines a data transformation job; digitally storing the user code at a first code repository; building the user code into executable code comprising one or more machine executable computer program files; receiving a data transformation command that identifies the function; in response to receiving the data transformation command: based on the one or more expected input datasets for the lower-order function, using the executable code to invoke a higher-order function that corresponds to the function to identify and retrieve the one or more dependent datasets; executing the function, using the one or more dependent datasets and the lower-order function, to generate a particular output dataset. 16. The system of claim 15 , wherein the user code is written in a Python programming language and comprises at least a portion of a computer program.

Assignees

Inventors

Classifications

  • Object-oriented · CPC title

  • G06F8/315Primary

    Object-oriented languages · CPC title

  • Version control (security arrangements therefor G06F21/57); Configuration management · CPC title

  • Functional or applicative languages; Rewrite languages · CPC title

  • G06F8/40Primary

    Transformation of program code · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10860299B2 cover?
Data transformation in a distributed system of applications and data repositories is described. The subsystems for the overall framework are distributed, thereby allowing for customization to require only isolated changes to one or more subsystems. In one embodiment, a source code repository is used to receive and store source code. A build subsystem can retrieve source code from the source cod…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/315. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 08 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).