Data migration in a distributed file system
US-12135695-B2 · Nov 5, 2024 · US
US2025363118A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025363118-A1 |
| Application number | US-202519289217-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 4, 2025 |
| Priority date | Aug 5, 2020 |
| Publication date | Nov 27, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A database system a data input sub-system, a query and response sub-system, and a store and compute sub-system. The data input sub-system receives, over time, a dataset, temporarily stores it, and converts it into data segments. The query and response sub-system obtains a query regarding data of the dataset prior to the dataset being stored as data segments and provides a query operation regarding the query to the store and compute sub-system. The store and compute sub-system identify data of the dataset for processing via the query operation and determines whether the identified data is stored as a set of data segments within the store and compute sub-system. When not, the store and compute sub-system sends a request for the identified data to the data input sub-system; receives a set of pages in response to the request; and executes the query operation on the received set of pages.
Opening claim text (preview).
What is claimed is: 1 . A database system comprises: a data input sub-system is operable to: receive, over time, a dataset that includes a plurality of rows of columnar data; as the dataset is being received, store sets of rows of columnar data as pages of data, wherein a set of rows of columnar data corresponds to a page of data; when a data segment amount of pages of data have been stored, process the data segment amount of pages of data to produce a data segment; provide the data segment to a store and compute sub-system for storage therein; a query and response sub-system is operable to: obtain a query regarding data of the dataset prior to the dataset being stored in the store and compute sub-system as a plurality of data segments; and provide a query operation regarding the query to the store and compute sub-system; and the store and compute sub-system is, for the query operation, operable to: identify data of the dataset for processing via the query operation; determine whether the identified data is stored as a set of data segments within the store and compute sub-system; when the identified data is not stored within the store and compute sub-system: send a request for the identified data to the data input sub-system; receive a set of pages in response to the request; and execute the query operation on the received set of pages to produce a query result. 2 . The database system of claim 1 , wherein the data input sub-system is further operable to: process, in accordance with a set of segment factors, the data segment amount of pages of data to produce the data segment, wherein the set of segment factors includes one or more segment factors, wherein a segment factor of the set of segment factors includes one of: an indication of a number of records to include in the data segment, an indication of a number of segments to include in a segment group. an identification of how to segment a data partition based on storage capabilities of the store and compute sub-system, and an indication of a number of data segments for a data partition based on a redundancy storage encoding scheme. 3 . The database system of claim 1 , wherein the data input sub-system is further operable to process the data segment amount of pages of data to produce the data segment by one or more of: dividing the rows of columnar data of the data segment amount of pages of data into a plurality of data slabs; generating a primary index for the rows of columnar data of the data segment amount of pages of data and associating the primary index with the data segment; generating a secondary index for the rows of columnar data of the data segment amount of pages of data and associating the secondary index with the data segment; error encoding the rows of columnar data of the data segment amount of pages of data; and generating a manifest regarding metadata of the rows of columnar data of the data segment amount of pages of data and associating the manifest with the data segment. 4 . The database system of claim 1 , wherein the query and response sub-system is further operable to obtain the query by: receiving an initial query that includes a plurality of initial query operations, determining a set of optimizations for the initial query; and implementing the set of optimizations on the plurality of initial query operations to produce the query. 5 . The database system of claim 1 , wherein the store and compute sub-system comprises: a plurality of computing device clusters, wherein a computing device cluster of the plurality of computing device clusters includes a plurality of computing devices, wherein a computing device of the plurality of computing devices includes a plurality of computing nodes, and wherein a computing node of the plurality of computing nodes includes a plurality of processing core resources. 6 . The database system of claim 5 further comprises: a lead computing device of a first computing device cluster of the plurality of computing device clusters is operable to: receive the data segment from the data input sub-system; identify a target computing device of the first computing device to store the data segment; and send the data segment to the target computing device; a lead computing node of the target computing device is operable to: divide the data segment into a plurality of a sub-segments; and send the plurality of sub-segments to the plurality of computing nodes of the target computing device; a lead processing core resource of a first computing node of the plurality of computing nodes of the target computing device is operable to: receive a sub-segment of the plurality of sub-segments of the data segment; divide the sub-segment into a plurality of divisions of the sub-segment; and send the plurality of divisions of the sub-segment to the plurality of processing core resources of the first computing node. 7 . The database system of claim 5 further comprises: a lead computing device of a first computing device cluster of the plurality of computing device clusters is operable to: receiver the query operations that identify the data of the dataset; determine that the identified data is not stored as the set of data segments within the store and compute sub-system; receive the set of pages from the data input sub-system regarding the identified data; identify a target computing device of the first computing device to store the sets of pages; and send the sets of pages to the target computing device; a lead computing node of the target computing device is operable to: divide the sets of pages into a plurality of a sub-sets of pages; and send the plurality of sub-sets of pages to the plurality of computing nodes of the target computing device; a lead processing core resource of a first computing node of the plurality of computing nodes of the target computing device is operable to: receive a sub-set of pages of the plurality of sub-sets of pages; divide the sub-set of pages into a plurality of divisions of the sub-set; and send the plurality of divisions of the sub-set to the plurality of processing core resources of the first computing node; and the plurality of processing core resources of the first computing node are operable to execute the query operation on the plurality of divisions of the sub-set to produce a first portion of the query result. 8 . The database system of claim 1 , wherein the store and compute sub-system is further operable to: when the identified data is stored within the store and compute sub-system: identify stored data segments that correspond to the identified data; and execute the query operation on the stored data segments to produce the query result. 9 . The database system of claim 1 further comprises one or more of: the set of rows of columnar data including one or more rows of columnar data; the set of data segments including one or more data segments; the set of pages of data including a plurality of pages of data; the data of the dataset includes one or more rows of the plurality of rows of columnar data; the query operation includes one or more query operators, and a query operator is a specific function performed on data. 10 . A computer readable memory comprises: a first memory that stores operational instructions that, when executed by a data input sub-system of a database system, causes the data input sub-system to: receive, over time, a dataset that includes a plurality of rows of columnar data; as the dataset is being received, store sets of rows of columnar data as pages of data, wherein a set of rows of columnar data corresponds to a page of data; when a dat
using data annotations, e.g. user-defined metadata · CPC title
Column-oriented storage; Management thereof · CPC title
of query operations · CPC title
of parallel queries · CPC title
Query execution · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.