What technology area does this patent fall under?

Primary CPC classification G06F16/219. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Universal data pipeline

US10853338B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10853338-B2
Application number	US-201916240507-A
Country	US
Kind code	B2
Filing date	Jan 4, 2019
Priority date	Nov 5, 2014
Publication date	Dec 1, 2020
Grant date	Dec 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A history preserving data pipeline computer system and method. In one aspect, the history preserving data pipeline system provides immutable and versioned datasets. Because datasets are immutable and versioned, the system makes it possible to determine the data in a dataset at a point in time in the past, even if that data is no longer in the current version of the dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: at one or more computing devices comprising one or more processors and one or more storage media storing one or more computer programs executed by the one or more processors to perform the method, performing operations comprising: maintaining a build catalog comprising a plurality of build catalog entries, each build catalog entry comprising: an identifier of a version of a derived dataset corresponding to a build catalog entry, one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, and a derivation program build dependency that is executable to generate the version of the derived dataset corresponding to the build catalog entry; creating a new version of a particular derived dataset based on providing one or more particular child dataset versions as input to executing a particular version of a particular derivation program; and adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of each of the one or more particular child dataset versions, and at least one identifier of one or more particular child dataset versions that were provided as input to the particular derivation program. 2. The method of claim 1 , wherein the derivation program build dependency of a version of the derived dataset corresponding to the build catalog entry comprises an identifier of a version of a derivation program executed to generate the version of the derived dataset corresponding to the build catalog entry. 3. The method of claim 2 , further comprising: storing a first version of the derived dataset using a data lake; updating another dataset to produce a second version of the derived dataset, storing the second version of the derived dataset in the data lake in context of a successful transaction; and wherein the data lake comprises a distributed file system. 4. The method of claim 3 , wherein an identifier of the first version of the derived dataset is an identifier assigned to a commit of a transaction that stored the first version of the derived dataset. 5. The method of claim 3 , wherein an identifier of the second version of the derived dataset is an identifier assigned to a commit of a transaction that stored the second version of the derived dataset. 6. The method of claim 3 , wherein the first version of the derived dataset is stored in a first set of one or more data containers. 7. The method of claim 3 , wherein the second version of the derived dataset is stored in a second set of one or more data containers. 8. The method of claim 7 , wherein the second set of one or more data containers comprises delta encodings reflecting deltas between the first version of the derived dataset and the second version of the derived dataset. 9. The method of claim 3 , wherein the first version of the derivation program is executed to produce the first version of the derived dataset. 10. The method of claim 3 , wherein the first version of the derivation program is executed to produce the second version of the derived dataset. 11. A computer system comprising: one or more hardware processors; one or more computer programs; and one or more storage media storing the one or more computer programs for execution by the one or more hardware processors, the one or more computer programs comprising instructions for performing operations comprising: maintaining a build catalog comprising a plurality of build catalog entries, each build catalog entry comprising: an identifier of a version of a derived dataset corresponding to a build catalog entry, one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, and a derivation program build dependency that is executable to generate the version of the derived dataset corresponding to the build catalog entry; creating a new version of a particular derived dataset based on providing one or more particular child dataset versions as input to executing a particular version of a particular derivation program; and adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of each of the one or more particular child dataset versions, and at least one identifier of one or more particular child dataset versions that were provided as input to the particular derivation program. 12. The computer system of claim 11 , wherein the derivation program build dependency of a version of the derived dataset corresponding to the build catalog entry comprises an identifier of a version of a derivation program executed to generate the version of the derived dataset corresponding to the build catalog entry. 13. The computer system of claim 12 , wherein the one or more storage media store additional computer programs for performing operations comprising: storing a first version of the derived dataset using a data lake; updating another dataset to produce a second version of the derived dataset; storing the second version of the derived dataset in the data lake in context of a successful transaction; and wherein the data lake comprises a distributed file system. 14. The computer system of claim 13 , wherein an identifier of the first version of the derived dataset is an identifier assigned to a commit of a transaction that stored the first version of the derived dataset. 15. The computer system of claim 13 , wherein an identifier of the second version of the derived dataset is an identifier assigned to a commit of a transaction that stored the second version of the derived dataset. 16. The computer system of claim 13 , wherein the first version of the derived dataset is stored in a first set of one or more data containers. 17. The computer system of claim 13 , wherein the second version of the derived dataset is stored in a second set of one or more data containers. 18. The computer system of claim 17 , wherein the second set of one or more data containers comprises delta encodings reflecting deltas between the first version of the derived dataset and the second version of the derived dataset. 19. The computer system of claim 13 , wherein the first version of the derivation program is executed to produce the first version of the derived dataset. 20. The computer system of claim 13 , wherein the first version of the derivation program is executed to produce the second version of the derived dataset.

Assignees

Palantir Technologies Inc

Inventors

Classifications

G06F16/1865
Transactional file systems · CPC title
G06F16/2365
Ensuring data consistency and integrity · CPC title
G06F16/219Primary
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
G06F16/211
Schema design and management · CPC title
G06F16/2386
Bulk updating operations (data conversion details G06F16/258) · CPC title

Patent family

Related publications grouped by family.

View patent family 54540857

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10853338B2 cover?: A history preserving data pipeline computer system and method. In one aspect, the history preserving data pipeline system provides immutable and versioned datasets. Because datasets are immutable and versioned, the system makes it possible to determine the data in a dataset at a point in time in the past, even if that data is no longer in the current version of the dataset.
Who is the assignee on this patent?: Palantir Technologies Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/219. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).