Data re-sharding
US-11030169-B1 · Jun 8, 2021 · US
US12493601B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12493601-B2 |
| Application number | US-202217722754-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 18, 2022 |
| Priority date | Jan 31, 2019 |
| Publication date | Dec 9, 2025 |
| Grant date | Dec 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Indexing in a low-latency data access and analysis system includes accessing, by an indexing unit of a low-latency data access and analysis system, constituent data from a data source of the low-latency data access and analysis system and indexing the constituent data in an index of the low-latency data access and analysis system by an indexing unit of the low-latency data access and analysis system. Indexing includes partitioning the constituent data based on a characteristic of the constituent data into at least a first partition and a second partition, segmenting the first partition into a first segment of the first partition, sharding the first segment into a first shard of the first segment of the first partition, segmenting, using hash-partitioning, the second partition into one or more segments of the second partition, and for respective segments of the second partition, sharding the respective segment into one or more respective shards.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: accessing, by an indexing unit of a low-latency data access and analysis system, constituent data from a data source of the low-latency data access and analysis system; and indexing the constituent data in an index of the low-latency data access and analysis system by the indexing unit of the low-latency data access and analysis system, wherein indexing includes: partitioning the constituent data based on a characteristic of the constituent data into at least a first partition and a second partition; segmenting the first partition into a first segment of the first partition; sharding the first segment into a first shard of the first segment of the first partition; segmenting, using hash partitioning, the second partition into one or more segments of the second partition; and for respective segments of the second partition, sharding the respective segments into one or more respective shards. 2 . The method of claim 1 , wherein indexing includes: sending, from the indexing unit to the data source, a request to pin a portion of a database of the data source corresponding to the constituent data; in response to receiving, by the indexing unit, an indication that the portion of the database is pinned, sending, from the indexing unit to the data source, a sampling data request indicating a sampling data-query for the portion of the database; and accessing, by the indexing unit, sampling results responsive to the sampling data-query. 3 . The method of claim 2 , wherein the constituent data includes a plurality of logical tables. 4 . The method of claim 3 , wherein partitioning includes: identifying a smallest unpartitioned table from the plurality of logical tables; determining that a current size of the first partition is less than a defined maximum size for the first partition, and in response to determining that the current size of the first partition is less than the defined maximum size for the first partition: identifying a sum of the current size of the first partition and a size of the smallest unpartitioned table as the current size of the first partition; and assigning the smallest unpartitioned table to the first partition; determining that the current size of the first partition is at least the defined maximum size for the first partition, and in response to determining that the current size of the first partition is at least the defined maximum size for the first partition, assigning the smallest unpartitioned table to the second partition; and identifying the smallest unpartitioned table as a partitioned table. 5 . The method of claim 3 , wherein segmenting, using hash partitioning, the second partition includes: identifying, as a cardinality of the one or more segments of the second partition, a lesser of a defined maximum cardinality of segments of the second partition or a quotient of dividing a sum of sizes of tables from the plurality of logical tables assigned to the second partition by a defined maximum segment size. 6 . The method of claim 2 , wherein, for a respective segment, sharding includes: identifying, by a segment manager of the indexing unit, an indexing mode for indexing an object from the respective segment based on the sampling results; generating, by the segment manager, a shard specification for generating a shard of the respective segment based on the sampling results and the indexing mode; sending, from the indexing unit to the data source, a constituent data request indicating a constituent data-query for the respective segment; generating a shard assignment indicating the shard specification and an indexing operation unit; and generating, by the indexing operation unit, the shard based on the shard assignment, wherein generating the shard includes accessing the constituent data responsive to the constituent data request. 7 . The method of claim 1 , further comprising: receiving data expressing usage intent with respect to the constituent data; in response to receiving the data expressing usage intent, generating response data responsive to the data expressing usage intent, wherein generating the response data includes resolving at least a portion of the data expressing usage intent by traversing the index, wherein traversing the index includes traversing a shard from the index to identify a token corresponding to a portion of the data expressing usage intent; and outputting the response data. 8 . The method of claim 1 , wherein the characteristic is table size. 9 . The method of claim 1 , wherein indexing includes: obtaining, by the low-latency data access and analysis system, index configuration data for indexing the constituent data, wherein the index configuration data includes at least one of token type information, data source information, or index distribution coordination information. 10 . A low-latency data access and analysis system comprising: a non-transitory computer-readable storage medium that stores instructions for operating the low-latency data access and analysis system; and a processor that executes the instructions to operate an indexing unit to index constituent data in an index of the low-latency data access and analysis system, wherein, to index the constituent data, the processor executes the instructions to: access, by the indexing unit, the constituent data from a data source; partition the constituent data based on a characteristic of the constituent data into at least a first partition and a second partition; segment the first partition into a first segment of the first partition; shard the first segment into a first shard of the first segment of the first partition; segment, using hash partitioning, the second partition into one or more segments of the second partition; and for respective segments of the second partition, shard the respective segments into one or more respective shards. 11 . The low-latency data access and analysis system of claim 10 , wherein, to index the constituent data, the processor executes the instructions to: send, from the indexing unit to the data source, a request to pin a portion of a database of the data source corresponding to the constituent data; receive, by the indexing unit, an indication that the portion of the database is pinned; in response to the indication that the portion of the database is pinned, send, from the indexing unit to the data source, a sampling data request indicating a sampling data-query for the portion of the database; and access, by the indexing unit, sampling results responsive to the sampling data-query. 12 . The low-latency data access and analysis system of claim 11 , wherein the constituent data includes a plurality of logical tables. 13 . The low-latency data access and analysis system of claim 12 , wherein, to partition the constituent data, the processor executes the instructions to: identify a smallest unpartitioned table from the plurality of logical tables; in response to a determination that a current size of the first partition is less than a defined maximum size for the first partition: identify a sum of the current size of the first partition and a size of the smallest unpartitioned table as the current size of the first partition; and assign the smallest unpartitioned table to the first partition; in response to a determination that the current size of the first partition is at least the defined maximum size for the first partition, assign the smallest unpartitioned table to the second partition; and identify the smallest unpartitioned table as a partitioned table.
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
Hash tables · CPC title
Management thereof · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.