Index sharding

US11334548B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11334548-B2
Application numberUS-201916556719-A
CountryUS
Kind codeB2
Filing dateAug 30, 2019
Priority dateJan 31, 2019
Publication dateMay 17, 2022
Grant dateMay 17, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Index sharding in a low-latency database analysis system includes obtaining index configuration data for indexing constituent data, the constituent data including a plurality of logical tables, and indexing, by an indexing unit, the constituent data by partitioning the constituent data based on a characteristic of the constituent data into at least a first partition and a second partition, segmenting the first partition into a first segment of the first partition, sharding the first segment into a first shard of the first segment of the first partition, segmenting, using hash-partitioning, the second partition into one or more segments of the second partition, and for each segment of the second partition, sharding the segment into one or more respective shards.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining constituent data; in response to a determination that a previously generated index portion representing the constituent data is unavailable in a primary index of a low-latency database analysis system, generating, by an indexing unit of the low-latency database analysis system, an index portion representing the constituent data in an auxiliary index, wherein generating the index portion representing the constituent data in the auxiliary index includes: partitioning the constituent data based on a characteristic of the constituent data into at least a first partition and a second partition; segmenting the first partition into a first segment of the first partition; sharding the first segment of the first partition into a first shard of the first segment of the first partition; segmenting, using hash-partitioning, the second partition into one or more segments of the second partition; and for each segment of the second partition, sharding the segment into one or more respective shards; in response to a determination that the previously generated index portion representing the constituent data is available in the primary index, indexing, by the indexing unit, the constituent data in the primary index, wherein indexing the constituent data in the primary index includes: partitioning the constituent data based on the characteristic of the constituent data into at least a first partition and a second partition; segmenting the first partition into a first segment of the first partition; sharding the first segment of the first partition into a first shard of the first segment of the first partition; segmenting, using hash-partitioning, the second partition into one or more segments of the second partition; and for each segment of the second partition, sharding the segment into one or more respective shards; and in response to a defined event, compacting the auxiliary index into the primary index. 2. The method of claim 1 , wherein indexing the constituent data includes: receiving, at the indexing unit, information indicating a change of at least a portion of the constituent data; and identifying the constituent data in response to receiving the information indicating the change. 3. The method of claim 1 , wherein indexing the constituent data includes: sending, from the indexing unit to a database unit, a request to pin a portion of a database corresponding to the constituent data; in response to receiving, by the indexing unit, an indication that the portion of the database is pinned, sending, from the indexing unit to the database unit, a sampling data request indicating a sampling data-query for the portion of the database; and accessing, by the indexing unit, sampling results responsive to the sampling data-query. 4. The method of claim 3 , wherein indexing the constituent data includes: in response to accessing, by the indexing unit, the sampling results, sending, from the indexing unit to a segmentation assigner, a segmentation assignment request; and in response to obtaining, by the indexing unit, a segmentation assignment, partitioning the constituent data in accordance with the segmentation assignment. 5. The method of claim 3 , wherein indexing the constituent data includes obtaining information representing the constituent data as a plurality of logical tables. 6. The method of claim 5 , wherein partitioning the constituent data includes: identifying a smallest unpartitioned table from the plurality of logical tables; in response to a determination that a current size of the first partition is less than a defined maximum size for the first partition: identifying a sum of the current size of the first partition and a size of the smallest unpartitioned table as the current size of the first partition; and assigning the smallest unpartitioned table to the first partition; in response to a determination that the current size of the first partition is at least the defined maximum size for the first partition, assigning the smallest unpartitioned table to the second partition; and identifying the smallest unpartitioned table as a partitioned table. 7. The method of claim 5 , wherein segmenting, using hash-partitioning, the second partition includes: identifying, as a cardinality of the one or more segments of the second partition, the lesser of a defined maximum cardinality of segments of the second partition or a quotient of dividing a sum of the sizes of the tables from the plurality of logical tables assigned to the second partition by a defined maximum segment size. 8. The method of claim 3 , wherein, for a respective segment, sharding includes: identifying, by a segment manager of the indexing unit, an indexing mode for indexing an object from the respective segment based on the sampling results; generating, by the segment manager, a shard specification for generating a shard of the respective segment based on the sampling results and the indexing mode; sending, from the indexing unit to the database unit, a constituent data request indicating a constituent data-query for the respective segment; generating a shard assignment indicating the shard specification and an indexing operation unit; and generating, by the indexing operation unit, the shard based on the shard assignment, wherein generating the shard includes accessing the constituent data responsive to the constituent data request. 9. The method of claim 1 , further comprising: receiving data expressing a usage intent with respect to the low-latency database analysis system; in response to receiving the data expressing the usage intent, generating response data responsive to the data expressing the usage intent, wherein generating the response data includes: generating a resolved-request representing the data expressing the usage intent by resolving at least a portion of the data expressing the usage intent by traversing a unified index, wherein the unified index includes the primary index and the auxiliary index; generating a data-query representing the resolved-request; and sending the data-query to a database for execution to obtain the response data; and outputting the response data. 10. The method of claim 9 , wherein traversing the unified index includes: traversing a shard from the primary index to identify a token corresponding to a portion of the data expressing the usage intent. 11. The method of claim 9 , wherein traversing the unified index includes: traversing a shard from the auxiliary index to identify a token corresponding to a portion of the data expressing the usage intent. 12. A method comprising: obtaining index configuration data for indexing constituent data, the constituent data including a plurality of logical tables; and indexing in an index, by an indexing unit, the constituent data by: partitioning the constituent data based on a characteristic of the constituent data into at least a first partition and a second partition, wherein partitioning the constituent data includes: identifying a smallest unpartitioned table from the plurality of logical tables: in response to a determination that a current size of the first partition is less than a defined maximum size for the first partition: identifying a sum of the current size of the first partition and a size of the smallest unpartitioned table as the current size of the first partition; and assigning the smallest unpartitioned table to the first partition; in response to a determination that the current size of the first partition is at least the defined maximum size for the first partition, assigning the smallest

Assignees

Inventors

Classifications

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Hash tables · CPC title

  • Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11334548B2 cover?
Index sharding in a low-latency database analysis system includes obtaining index configuration data for indexing constituent data, the constituent data including a plurality of logical tables, and indexing, by an indexing unit, the constituent data by partitioning the constituent data based on a characteristic of the constituent data into at least a first partition and a second partition, segm…
Who is the assignee on this patent?
Thoughtspot Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2272. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 17 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).