What technology area does this patent fall under?

Primary CPC classification G06F16/215. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Automatic scaling for targeted sweep

US12561300B1 · US · B1

Patent metadata
Field	Value
Publication number	US-12561300-B1
Application number	US-202519033400-A
Country	US
Kind code	B1
Filing date	Jan 21, 2025
Priority date	Nov 22, 2024
Publication date	Feb 24, 2026
Grant date	Feb 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for scaling parallelism in sweeping stale data associated with multi-version data management is disclosed. The system is programmed to manage a targeted sweep of stale data stored on distributed storage associated with a set of computing nodes, where the stale data includes metadata for database transactions. The system is programmed to dynamically determine, on a computing node, how many threads to simultaneously perform deletion for the targeted sweep based on historical performance of the computing nodes, historical performance of the distributed storage associated with the computing nodes, or other factors. The system is programmed to then cause a certain number of threads to run on the computing node, where each thread identifies and deletes specific metadata for data transactions within the scope of the targeted sweep.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of scaling parallelism in sweeping stale data associated with multi-version data management, comprising: managing versions of a cell of a collection of cells in a first database and metadata entries each indicating a timestamp of a write transaction leading to a new version of the cell in a second database, the metadata entries for the collection of cells being stored on a plurality of shards associated with one or more computing nodes; selecting a current sweep timestamp for a current sweep; determining a number of threads to run simultaneously on each computing node of the one or more computing nodes to perform deletion for the current sweep based on a plurality of factors, including historical performance of each computing node or historical performance of local storage associated with each computing node; and causing the number of threads to run simultaneously on each computing node of the one or more computing nodes, a first thread running on a specific computing node of the one or more computing nodes performing: identifying, from metadata entries on a shard associated with the specific computing node, a first set of cells of a certain size with versions written in the first database prior to the current sweep timestamp; and deleting metadata entries corresponding to the first set of cells and having timestamps prior to the current sweep timestamp from the second database in one database transaction, wherein the method is performed by one or more processors. 2 . The method of claim 1 , the current sweep timestamp being a lower of a start timestamp of a longest running write transaction and an earliest readable time. 3 . The method of claim 1 , the determining starting with dividing a given total number of threads by a number of live computing nodes. 4 . The method of claim 1 , the determining being further based on a quantity of resources currently accessible to each computing node for sweep purposes, a total number of live computing nodes, or a total amount of deletion yet to be performed for the current sweep. 5 . The method of claim 1 , the certain size being a predetermined number of metadata entries or a number of metadata entries having timestamps falling in a period of a predetermined length. 6 . The method of claim 1 , the identifying comprising counting a specific cell only once towards the first set of cells when at least one metadata entry for the specific cell each having a timestamp less than the current sweep timestamp is encountered. 7 . The method of claim 1 , a portion of the shard associated with the specific computing node being divided into one or more fine partitions, each fine partition being represented as a row in a table having a first column for an index of the fine partition and a second column for a list of metadata entries, the identifying comprising retrieving a specific row from the table and reviewing columns of the row. 8 . The method of claim 7 , further comprising: receiving information regarding the first set of cells provided by the first thread; recording coverage of a fine partition of the one or more fine partitions by the first set of cells based on the information, a second thread running on the specific computing node identifying, from the shard associated with the specific computing node, a second set of cells with versions written in the first database prior to the current sweep timestamp based on the coverage. 9 . The method of claim 1 , further comprising receiving a periodic performance update related to a computing node or an associated local storage or a periodic resource availability update indicating a quantity of resources accessible from the computing node. 10 . The method of claim 1 , further comprising: determining that a first computing node of the one or more computing nodes is no longer alive; adjusting the number of threads to run on a second computing node of the one or more computing nodes that is still alive to perform deletion for the current sweep without waiting for the first computing node to be alive again. 11 . The method of claim 1 , the deleting further comprising removing versions of a cell written prior to the current sweep timestamp from the first database before removing corresponding metadata entries from the second database. 12 . The method of claim 1 , a second thread running on a certain computing node of the one or more computing nodes performing identifying a second set of cells with versions written in the first database prior to the current sweep timestamp, the causing comprising: evaluating all sets of cells collected by the respective numbers of threads running on the one or more computing nodes; determining the first set of cells intersecting with the second set of cells; and instructing the second thread not to perform deletion based on the second set of cells. 13 . A system for scaling parallelism in sweeping stale data associated with multi-version data management, comprising: a memory; one or more processors coupled to the memory and configured to perform: managing versions of a cell of a collection of cells in a first database and metadata entries each indicating a timestamp of a write transaction leading to a new version of the cell in a second database, the metadata entries for the collection of cells being stored on a plurality of shards associated with one or more computing nodes; selecting a current sweep timestamp for a current sweep; determining a number of threads to run simultaneously on each computing node of the one or more computing nodes to perform deletion for the current sweep based on a plurality of factors, including historical performance of each computing node or historical performance of local storage associated with each computing node; and causing the number of threads to run simultaneously on each computing node of the one or more computing nodes, a first thread running on a specific computing node of the one or more computing nodes performing: identifying, from metadata entries on a shard associated with the specific computing node, a first set of cells of a certain size with versions written in the first database prior to the current sweep timestamp; and deleting metadata entries corresponding to the first set of cells and having timestamps prior to the current sweep timestamp from the second database in one database transaction. 14 . The system of claim 13 , the determining starting with dividing a given total number of threads by a number of live computing nodes. 15 . The system of claim 13 , the determining being further based on a quantity of resources currently accessible to each computing node for sweep purposes, a total number of live computing nodes, or a total amount of deletion yet to be performed for the current sweep. 16 . The system of claim 13 , the identifying comprising counting a specific cell only once towards the first set of cells when at least one metadata entry for the specific cell each having a timestamp less than the current sweep timestamp is encountered. 17 . The system of claim 13 , a portion of the shard associated with the specific computing node being divided into one or more fine partitions, each fine partition being represented as a row in a table having a first column for an index of the fine partition and a second column for a list of metadata entries, the identifying comprising retrieving a specific row from the table and reviewing columns of the row. 18 . The system of claim 13 , the one

Assignees

Palantir Technologies Inc

Inventors

Classifications

G06F16/215Primary
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
G06F16/2322Primary
using timestamps · CPC title

Patent family

Related publications grouped by family.

View patent family 98866393

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12561300B1 cover?: A system for scaling parallelism in sweeping stale data associated with multi-version data management is disclosed. The system is programmed to manage a targeted sweep of stale data stored on distributed storage associated with a set of computing nodes, where the stale data includes metadata for database transactions. The system is programmed to dynamically determine, on a computing node, how m…
Who is the assignee on this patent?: Palantir Technologies Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).