Systems and methods for sharding based on distributed inverted indexes

US11599500B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11599500-B2
Application numberUS-201916600106-A
CountryUS
Kind codeB2
Filing dateOct 11, 2019
Priority dateOct 11, 2018
Publication dateMar 7, 2023
Grant dateMar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment, distributing data across a plurality of storage shards can comprise generating a file key for each file of a plurality of files stored in a plurality of physical shards, each physical shard maintained by a node of a plurality of nodes in one or more clusters. The file key can comprise a hash of an enterprise identifier for an entity to which the creator of the file is a member, a hash of a folder identifier for a location in which the file is stored, and a hash of a file identifier uniquely identifying the file. The generated file keys can be sorted into an ordered list and the ordered list can be logically partitioning into a plurality of logical shards. Each logical shard of the plurality of logical shards can then be mapped to one of the plurality of physical shards.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for distributing data across a plurality of storage shards, the method comprising: generating, by a server of a cloud-based storage system, a file key for each file of a plurality of files stored in a plurality of physical shards, each physical shard maintained by a node of a plurality of nodes in one or more clusters, wherein the file key comprises a hash of an enterprise identifier (enterprise ID) for an entity to which the creator of the file is a member, a hash of a folder identifier (folder ID) for a location in which the file is stored, and a hash of a file identifier (file ID) uniquely identifying the file; sorting, by the server of the cloud-based storage system, the generated file keys for each file of the plurality of files into an ordered list; logically partitioning, by the server of the cloud-based storage system, the ordered list into a plurality of logical shards; mapping, by the server of the cloud-based storage system, each logical shard of the plurality of logical shards to one of the plurality of physical shards; identifying, by the server of the cloud-based storage system, a last key value for each logical shard in the partitioned ordered list; saving, by the server of the cloud-based storage system, the identified last key value for each logical shard in a meta-store for the physical shard to which the logical shard is mapped; and partially re-indexing, by the server of the cloud-based storage system, the plurality of shards of the one or more clusters based on addition of a node to the plurality of node in the one or more clusters, wherein partially re-indexing the plurality of shards comprises selecting a set of largest shards in the one or more clusters based on an index size of each shard, redirecting traffic for each shard of the selected set of shards to a different cluster of the one or more clusters while the plurality of shards are being partially reindexed, and re-directing traffic back to each shard of the selected set of shards upon completion of partially reindexing the plurality of shards. 2. The method of claim 1 , further comprising indexing a new file in the plurality of physical shards, wherein the indexing comprises: receiving, by the server of the cloud-based storage system, the new file; generating, by the server of the cloud-based storage system, a file key for the new file; identifying, by the server of the cloud-based storage system, a target shard for the new file based on the generated file key for the new file and the saved identified last key value for each logical shard saved in the meta-store for each physical shard; and routing, by the server of the cloud-based storage system, the new file to the identified target shard. 3. The method of claim 1 , further comprising processing a query from a user, wherein processing the query comprises: receiving, by the server of the cloud-based storage system, an enterprise ID associated with the user; identifying, by the server of the cloud-based storage system, one or more shards based on the received enterprise ID associated with the user and the saved identified last key value for each logical shard saved in the meta-store for each physical shard; and routing, by the server of the cloud-based storage system, the query to the identified one or more shards. 4. The method of claim 1 , wherein mapping each logical shard of the plurality of logical shards to one of the plurality of physical shards further comprises: sorting, by the server of the cloud-based storage system, the plurality of logical shards in descending order based on a load of each logical shard; assigning, by the server of the cloud-based storage system, one logical shard beginning from a top of the sorted ordered list to each physical shard of the plurality of physical shards; sorting, by the server of the cloud-based storage system, remaining unassigned logical shards in ascending order based on the load of each remaining unassigned logical shard; assigning, by the server of the cloud-based storage system, one logical shard beginning from the top of the sorted ordered list to each physical shard of the plurality of physical shards; determining, by the server of the cloud-based storage system, whether all logical shards have been assigned to a physical shard; and in response to determining not all logical shards have been assigned to a physical shard, repeating, by the server of the cloud-based storage system, said sorting of the plurality of logical shards in descending order based on a load of each logical shard, assigning one logical shard beginning from a top of the sorted ordered list to each physical shard of the plurality of physical shards, sorting of remaining unassigned logical shards in ascending order based on the load of each remaining unassigned logical shard, and assigning one logical shard beginning from the top of the sorted ordered list to each physical shard of the plurality of physical shards until all logical shards have been assigned to a physical shard. 5. The method of claim 1 , further comprising spilling over a source shard of the plurality of physical shards to a target shard of the plurality of target shards, wherein spilling over the source shard to the target shard comprises: identifying, by the server of the cloud-based storage system, the source shard based on available storage capacity of each physical shard of the plurality of physical shards; identifying, by the server of the cloud-based storage system, the target shard based on the available storage capacity of each physical shard of the plurality of physical shards; spilling, by the server of the cloud-based storage system, content from the source shard to the target shard; updating, by the server of the cloud-based storage system, an override map of the meta-store for the source shard and the target shard; and federating, by the server of the cloud-based storage system, the update to the override map of the meta-store for the source shard and the target shard to one or more components of the cloud-based storage system. 6. The method of claim 1 , further comprising spilling over a source shard of the plurality of physical shards to a target shard of the plurality of target shards, wherein spilling over the source shard to the target shard comprises: identifying, by the server of the cloud-based storage system, available space for each node of the plurality of node in a cluster of the one or more clusters; selecting, by the server of the cloud-based storage system, a plurality of target shards based on an overhead capacity of each shard of the plurality of shards on a node of the plurality of node having a most available space; identifying, by the server of the cloud-based storage system, one or more source shards based on one or more overhead criteria; selecting, by the server of the cloud-based storage system, one of the identified one or more source shards; spilling, by the server of the cloud-based storage system, the selected one of the identified one or more source shards to the selected plurality of target shards; updating, by the server of the cloud-based storage system, an override map of the meta-store for the source shard and each target shard; federating, by the server of the cloud-based storage system, the update to the override map of the meta-store for the source shard and each target shard to one or more components of the cloud-based storage system; determining, by the server of the cloud-based storage system, whether all of the identified one or more source shards have been spilled to the plurality of target shards; and in response to determining not all of the identified one or more source shards have been spilled to the plurality of target shards, repeating, by the server

Assignees

Inventors

Classifications

  • G06F16/152Primary

    using file content signatures, e.g. hash values · CPC title

  • Provision of network file services by network file servers, e.g. by using NFS, CIFS (network file access protocols H04L67/1097) · CPC title

  • G06F16/137Primary

    Hash-based (content-based indexing of textual data G06F16/31) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11599500B2 cover?
According to one embodiment, distributing data across a plurality of storage shards can comprise generating a file key for each file of a plurality of files stored in a plurality of physical shards, each physical shard maintained by a node of a plurality of nodes in one or more clusters. The file key can comprise a hash of an enterprise identifier for an entity to which the creator of the file …
Who is the assignee on this patent?
Box Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/152. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).