Data pipeline creation system and method

US10983988B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10983988-B2
Application numberUS-201916362113-A
CountryUS
Kind codeB2
Filing dateMar 22, 2019
Priority dateDec 27, 2018
Publication dateApr 20, 2021
Grant dateApr 20, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method comprises receiving one or more data transformation commands through a console in a console session, the one or more data transformation commands relating to one or more initial datasets; executing the one or more data transformation commands using the one or more initial datasets to modify at least one of the one or more initial datasets to generate a modified dataset; generating a set of environment flags for the command to indicate that the one or more initial datasets has been accessed and the at least one dataset that has been modified; and updating a set of line dependencies based on the generated set of environmental flags and previously generated sets of environmental flags for one or more previously executed commands.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method performed by one or more processors, the method comprising: receiving one or more data transformation commands through a console in a console session, the one or more data transformation commands relating to one or more initial datasets; executing the one or more data transformation commands using the one or more initial datasets to modify at least one of the one or more initial datasets to generate a modified dataset; hashing one or more of accessed datasets after executing the data transformation commands to generate current hashed values of the one or more accessed datasets; comparing the current hashed values of the accessed one or more datasets to hashed values of the one or more initial datasets to determine which of the one or more accessed datasets was modified by the data transformation commands; generating a set of environment flags for the data transformation commands to indicate that the one or more initial datasets has been accessed and the at least one dataset that has been modified; and updating a set of line dependencies based on the generated set of environmental flags and previously generated sets of environmental flags for one or more previously executed commands. 2. The method of claim 1 , wherein updating the set of line dependencies comprises: determining which initial datasets were accessed during execution of the data transformation command; and for each accessed dataset, searching the previously generated sets of environmental flags to find a last previously executed command at which said accessed dataset was modified. 3. The method of claim 1 , wherein the one or more accessed datasets comprises a plurality of data elements, and hashing the one or more accessed datasets comprises summing hashes of the plurality of data elements. 4. The method of claim 1 , further comprising updating a variable access list based on the environmental flags, wherein the environmental access list comprises a set of dataset identities corresponding to one or more datasets, and an associated line number for each dataset indicating a line of the console session on which said dataset was last modified. 5. The method of claim 1 , wherein the list of line dependencies is used to infer a data pipeline. 6. The method of claim 1 , wherein executing the one or more data transformation commands comprises generating a data frame. 7. The method of claim 1 , wherein the data transformation command comprises one or more of: a join operation, a filter operation, a more general column and/or row transformation; mathematical operations performed on numbers; and/or string operations performed on strings. 8. The method of claim 1 , wherein the one or more initial datasets comprises: a list; a table; an object; a dictionary; a string; a number; or a file. 9. A non-transitory computer readable medium having computer readable code stored thereon, the computer readable code, when executed by at least one processor of a computing device, causing performance of the steps of: receiving one or more data transformation commands through a console in a console session, the one or more data transformation commands relating to one or more initial datasets; executing the one or more data transformation commands using the one or more initial datasets to modify at least one of the one or more initial datasets to generate a modified dataset; hashing one or more of accessed datasets after executing the data transformation commands to generate current hashed values of the one or more accessed datasets; comparing the current hashed values of the accessed one or more datasets to hashed values of the one or more initial datasets to determine which of the one or more accessed datasets was modified by the data transformation commands; generating a set of environment flags for the data transformation commands to indicate that the one or more initial datasets has been accessed and the at least one dataset that has been modified; and updating a set of line dependencies based on the generated set of environmental flags and previously generated sets of environmental flags for one or more previously executed commands. 10. The non-transitory computer readable medium of claim 9 , wherein updating the set of line dependencies further causes performance of the steps of: determining which initial datasets were accessed during execution of the data transformation command; and for each accessed dataset, searching the previously generated sets of environmental flags to find a last previously executed command at which said accessed dataset was modified. 11. The non-transitory computer readable medium of claim 9 , wherein the one or more accessed datasets comprises a plurality of data elements, and hashing the one or more accessed datasets comprises summing hashes of the plurality of data elements. 12. A system comprising: one or more processors; and a memory, the comprising computer readable instructions that, when executed by the one or more processors, cause the system to perform: receiving one or more data transformation commands through a console in a console session, the one or more data transformation commands relating to one or more initial datasets; executing the one or more data transformation commands using the one or more initial datasets to modify at least one of the one or more initial datasets to generate a modified dataset; hashing one or more of accessed datasets after executing the data transformation commands to generate current hashed values of the one or more accessed datasets; comparing the current hashed values of the accessed one or more datasets to hashed values of the one or more initial datasets to determine which of the one or more accessed datasets was modified by the data transformation commands; generating a set of environment flags for the data transformation commands to indicate that the one or more initial datasets has been accessed and the at least one dataset that has been modified; and updating a set of line dependencies based on the generated set of environmental flags and previously generated sets of environmental flags for one or more previously executed commands. 13. The system of claim 12 , wherein updating the set of line dependencies further cause the system to perform: determining which initial datasets were accessed during execution of the data transformation command; and for each accessed dataset, searching the previously generated sets of environmental flags to find a last previously executed command at which said accessed dataset was modified. 14. The system of claim 12 , wherein the one or more accessed datasets comprises a plurality of data elements, and hashing the one or more accessed datasets comprises summing hashes of the plurality of data elements.

Assignees

Inventors

Classifications

  • Hash tables · CPC title

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • Updates performed during online database operations; commit processing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10983988B2 cover?
A computer-implemented method comprises receiving one or more data transformation commands through a console in a console session, the one or more data transformation commands relating to one or more initial datasets; executing the one or more data transformation commands using the one or more initial datasets to modify at least one of the one or more initial datasets to generate a modified dat…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 20 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).