Continuous builds of derived datasets in response to other dataset updates

US12229189B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12229189-B2
Application numberUS-202217826099-A
CountryUS
Kind codeB2
Filing dateMay 26, 2022
Priority dateNov 22, 2017
Publication dateFeb 18, 2025
Grant dateFeb 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data processing method comprises creating and storing a dependency graph representing at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; reading configuration data specifying one or more periods for one or more datasets in the dependency graph; detecting a first update to a first dataset; initiating a first build of a first intermediate derived dataset only when a then-current time is within a first period of the one or more periods or a previous build of the first intermediate derived dataset occurred earlier than a then-current time less a second period of the one or more periods; asynchronously detecting a second update to a second dataset; initiating, in response to the second update, a second build of a second intermediate derived dataset that depends on the second dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: creating and storing a dependency graph in memory, based on which a data pipeline is maintained, the dependency graph representing at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; reading configuration data specifying one or more periods for one or more datasets in the dependency graph; detecting, at an unscheduled time, a first update to a first dataset among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; determining, in response to the first update, that a current time is within a first period of the one or more periods from a fixed time of a day or a previous build of a first intermediate derived dataset occurred earlier than the current time less a second period of the one or more periods; initiating, in response to the determining, at or near the current time, a first build of the first intermediate derived dataset that depends on the first dataset; detecting that a frequency of updates to a dataset on which the first intermediate derived dataset depends exceeds a threshold; in response to the detecting of the threshold being exceeded, updating the configuration data to revise the first period or the second period; asynchronously detecting a second update to a second dataset among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; initiating, in response to the second update, a second build of a second intermediate derived dataset that depends on the second dataset without waiting for the first update to propagate through the dependency graph; detecting and initiating continuously as other updates to other datasets are received, wherein the method is performed using one or more processors. 2. The method of claim 1 , the configuration data specifying the one or more periods respectively for one or more different datasets. 3. The method of claim 1 , the first period or the second period being associated with the first intermediate derived dataset. 4. The method of claim 1 , further comprising: updating the configuration data to specify a certain period for a pipeline that recursively applies to parent datasets; setting a period for a specific dataset having multiple children to a minimum of multiple periods applied to the multiple children. 5. The method of claim 1 , further comprising: detecting that an amount of resource used in building the first intermediate derived dataset exceeds a second threshold; in response to the detecting, updating the configuration data to specify the first period or the second period. 6. The method of claim 1 , the first period corresponding to an amount of resource usage below a certain threshold. 7. The method of claim 1 , further comprising: detecting that a final dependency of a third intermediate derived dataset that depends on the first intermediate derived dataset is satisfied; initiating a third build of the third intermediate derived dataset. 8. The method of claim 1 , the configuration data further specifying branch declarations related to logical branches of a build system. 9. The method of claim 8 , further comprising determining that the first intermediate derived dataset is not associated with a logical branch of the logical branches for which the branch declarations are specified. 10. A computer-readable, non-transitory storage medium storing computer-executable instructions, which when executed implement a method, the method comprising: creating and storing a dependency graph in memory, based on which a data pipeline is maintained, the dependency graph representing at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; reading configuration data specifying one or more periods for one or more datasets in the dependency graph; detecting, at an unscheduled time, a first update to a first dataset among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; determining, in response to the first update, that a current time is within a first period of the one or more periods from a fixed time of a day or a previous build of a first intermediate derived dataset occurred earlier than the current time less a second period of the one or more periods; initiating, in response to the determining, at or near the current time, a first build of the first intermediate derived dataset that depends on the first dataset; detecting that a frequency of updates to a dataset on which the first intermediate derived dataset depends exceeds a threshold; in response to the detecting of the threshold being exceeded, updating the configuration data to revise the first period or the second period; asynchronously detecting a second update to a second dataset among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; initiating, in response to the second update, a second build of a second intermediate derived dataset that depends on the second dataset without waiting for the first update to propagate through the dependency graph; detecting and initiating continuously as other updates to other datasets are received, wherein the method is performed using one or more processors. 11. The computer-readable, non-transitory storage medium of claim 10 , the configuration data specifying the one or more periods respectively for one or more different datasets. 12. The computer-readable, non-transitory storage medium of claim 10 , the first period or the second period being associated with the first intermediate derived dataset. 13. The computer-readable, non-transitory storage medium of claim 10 , the method further comprising: updating the configuration data to specify a certain period for a pipeline that recursively applies to parent datasets; setting a period for a specific dataset having multiple children to a minimum of multiple periods applied to the multiple children. 14. The computer-readable, non-transitory storage medium of claim 10 , the method further comprising: detecting that an amount of resource used in building the first intermediate derived dataset exceeds a second threshold; in response to the detecting, updating the configuration data to specify the first period or the second period. 15. The computer-readable, non-transitory storage medium of claim 10 , the first period corresponding to an amount of resource usage below a certain threshold. 16. The computer-readable, non-transitory storage medium of claim 10 , the method further comprising: detecting that a final dependency of a third intermediate derived dataset that depends on the first intermediate derived dataset is satisfied; initiating a third build of second the third intermediate derived dataset. 17. The computer-readable, non-transitory storage medium of claim 10 , the configuration data further specifying branch declarations related to logical branches of a build system. 18. The method of claim 8 , further comprising determining that the first intermediate derived dataset is not associated with a logical branch of the logical branches for which the branch declarations are specified.

Assignees

Inventors

Classifications

  • Updates performed during online database operations; commit processing · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12229189B2 cover?
A data processing method comprises creating and storing a dependency graph representing at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; reading configuration data specifying one or more periods for one or more datasets in the dependency graph; detecting a first update to a first dataset; initiating a f…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2379. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).