Universal data pipeline

US9946738B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9946738-B2
Application numberUS-201615287715-A
CountryUS
Kind codeB2
Filing dateOct 6, 2016
Priority dateNov 5, 2014
Publication dateApr 17, 2018
Grant dateApr 17, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A history preserving data pipeline computer system and method. In one aspect, the history preserving data pipeline system provides immutable and versioned datasets. Because datasets are immutable and versioned, the system makes it possible to determine the data in a dataset at a point in time in the past, even if that data is no longer in the current version of the dataset.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: at one or more computing devices comprising one or more processors and one or more storage media storing one or more computer programs executed by the one or more processors to perform the method, performing operations comprising: maintaining a build catalog comprising a plurality of build catalog entries; wherein each build catalog entry, of the plurality of build catalog entries, comprises: an identifier of a version of a derived dataset corresponding to the build catalog entry, one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, and a derivation program build dependency of the version of the derived dataset corresponding to the build catalog entry, the derivation program build dependency comprising an identifier of a version of a derivation program executed to generate the version of the derived dataset corresponding to the build catalog entry; creating a new version of a particular derived dataset in context of a successful transaction; adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, the identifier of the new version of the particular derived dataset being a transaction commit identifier assigned to the successful transaction; wherein the creating the new version of the particular derived dataset is based on executing a particular version of a particular derivation program; wherein the new build catalog entry comprises an identifier of the particular version of the particular derivation program; wherein the creating the new version of the particular derived dataset is based on providing one or more particular child dataset versions as input to the executing the particular version of the particular derivation program; and wherein the new build catalog entry comprises an identifier of each of the one or more particular child dataset versions. 2. One or more non-transitory storage media storing one or more computer programs, the one or more computer programs comprising instructions for performing operations comprising: maintaining a build catalog comprising a plurality of build catalog entries; wherein each build catalog entry, of the plurality of build catalog entries, comprises: an identifier of a version of a derived dataset corresponding to the build catalog entry, one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, and a derivation program build dependency of the version of the derived dataset corresponding to the build catalog entry, the derivation program build dependency comprising an identifier of a version of a derivation program executed to generate the version of the derived dataset corresponding to the build catalog entry; creating a new version of a particular derived dataset in context of a successful transaction; and adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, the identifier of the new version of the particular derived dataset being a transaction commit identifier assigned to the successful transaction; wherein the creating the new version of the particular derived dataset is based on executing a particular version of a particular derivation program; wherein the new build catalog entry comprises an identifier of the particular version of the particular derivation program; wherein the creating the new version of the particular derived dataset is based on providing one or more particular child dataset versions as input to the executing the particular version of the particular derivation program; and wherein the new build catalog entry comprises an identifier of each of the one or more particular child dataset versions. 3. A system comprising: one or more hardware processors; one or more computer programs; and one or more storage media storing the one or more computer programs for execution by the one or more hardware processors, the one or more computer programs comprising instructions for performing operations comprising: maintaining a build catalog comprising a plurality of build catalog entries; wherein each build catalog entry, of the plurality of build catalog entries, comprises: an identifier of a version of a derived dataset corresponding to the build catalog entry, one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, and a derivation program build dependency of the version of the derived dataset corresponding to the build catalog entry, the derivation program build dependency comprising an identifier of a version of a derivation program executed to generate the version of the derived dataset corresponding to the build catalog entry; creating a new version of a particular derived dataset in context of a successful transaction; adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, the identifier of the new version of the particular derived dataset being a transaction commit identifier assigned to the successful transaction; wherein the creating the new version of the particular derived dataset is based on executing a particular version of a particular derivation program; wherein the new build catalog entry comprises an identifier of the particular version of the particular derivation program; wherein the creating the new version of the particular derived dataset is based on providing one or more particular child dataset versions as input to the executing the particular version of the particular derivation program; and wherein the new build catalog entry comprises an identifier of each of the one or more particular child dataset versions.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9946738B2 cover?
A history preserving data pipeline computer system and method. In one aspect, the history preserving data pipeline system provides immutable and versioned datasets. Because datasets are immutable and versioned, the system makes it possible to determine the data in a dataset at a point in time in the past, even if that data is no longer in the current version of the dataset.
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30309. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 17 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).