Fast lookup of related data partitioned across a distributed key-value store

US10452610B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10452610-B2
Application numberUS-201715475720-A
CountryUS
Kind codeB2
Filing dateMar 31, 2017
Priority dateMar 31, 2017
Publication dateOct 22, 2019
Grant dateOct 22, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A storage cluster includes a plurality of key-value storage nodes categorized into sub-groups of data associated with a first value identifying the sub-group and second values identifying respective subsets of data. A key-value processing system receives at least one of a first request to retrieve a selected one of the sub-groups of data, the first request including a plurality of keys, each of the plurality of keys including the first value and a respective one of the second values, and a second request to retrieve a selected one of the subsets of data. The second request includes a key having the first value and a selected one of the second values. The selected one of the second values corresponds to a hash value. The storage cluster selectively provides at least one of the selected one of the sub-groups of data and the selected one of the subsets of data.

First claim

Opening claim text (preview).

What is claimed is: 1. A storage cluster for a distributed network system, the storage cluster being configured to evenly distribute a total amount of data across multiple nodes without regard to how many individual data entries are stored across said multiple nodes, the storage cluster comprising: a plurality of key-value storage nodes, wherein each key-value storage node included in the plurality of key-value storage nodes stores one or more sub-groups of data, wherein each sub-group of data in the sub-groups of data is associated with a first value identifying said sub-group of data and second values identifying respective subsets of data included within said sub-group of data, and wherein an amount of data distributed among the plurality of key-value storage nodes is evenly distributed without regard to a number of individual data entries stored at each individual key-value storage node of the plurality of key-value storage nodes, and a key-value processing system configured to receive, from a client device, at least one of: (i) a first request to retrieve an entire selected first one of the sub-groups of data, wherein the first request includes a plurality of keys, and wherein each key included in the plurality of keys includes: a same first value, which corresponds to the selected first one of the sub-groups of data, and one respective second value selected from among a plurality of corresponding second values that all correspond to the selected first one of the sub-groups of data, whereby retrieving a combination of the plurality of keys causes an entirety of the selected first one of the sub-groups of data to be retrieved; or, alternatively, (ii) a second request to retrieve only a selected subset of data included within a selected second one of the sub-groups of data, wherein the second request includes a particular key that includes a particular first value corresponding to the selected second one of the sub-groups of data and a particular second value corresponding only to the selected subset of data, and wherein the particular second value is a hash value identifying the selected subset of data, wherein the storage cluster is to selectively respond to either one of the first request or the second request. 2. The storage cluster of claim 1 , wherein the first value identifies a tenant associated with the storage cluster and the second values each identify a respective user within the tenant. 3. The storage cluster of claim 2 , wherein each of the second values correspond to a hashed username of the respective user. 4. The storage cluster of claim 1 , wherein each of the plurality of keys corresponds to x|h(y), wherein x is the first value, h(y) is the second value, y is a username associated with one of the subsets of data, and h(y) is a hash value resulting from applying a hash function to the username. 5. The storage cluster of claim 1 , wherein a first subset of data within a first sub-group of data is stored in a first one of the key-value storage nodes and a second subset of data within the first sub-group of data is stored in a second one of the key-value storage nodes. 6. The storage cluster of claim 1 , wherein, to generate the first request to retrieve the selected first one of the sub-groups of data, the client device generate the plurality of keys such that each key in the plurality of keys includes a different second value included among the plurality of corresponding second values. 7. The storage cluster of claim 6 , wherein the plurality of keys include p of the plurality of corresponding second values, wherein p is an integer greater than one, and wherein p corresponds to a hash range. 8. The storage cluster of claim 7 , wherein p is less than a quantity of users associated with a respective one of the sub-groups of data. 9. A method for evenly distributing a total amount of data across multiple nodes without regard to how many individual data entries are stored across said multiple nodes, the method comprising: storing, in a storage cluster that includes a plurality of key-value storage nodes, data categorized into sub-groups of data, wherein each sub-group of data in the sub-groups of data is associated with a first value identifying said sub-group of data and second values identifying respective subsets of data included within said sub-group of data, and wherein an amount of data distributed among the plurality of key-value storage nodes is evenly distributed without regard to a number of individual data entries stored at each individual key-value storage node of the plurality of key-value storage nodes; receiving, from a client device, at least one of: (i) a first request to retrieve an entire selected first one of the sub-groups of data, wherein the first request includes a plurality of keys, and wherein each key included in the plurality of keys includes: a same first value, which corresponds to the selected first one of the sub-groups of data, and one respective second value selected from among a plurality of corresponding second values that all correspond to the selected first one of the sub-groups of data, whereby retrieving a combination of the plurality of keys causes an entirety of the selected first one of the sub-groups of data to be retrieved; or, alternative, (ii) a second request to retrieve only a selected subset of data included within a selected second one of the sub-groups of data, wherein the second request includes a particular key that includes a particular first value corresponding to the selected second one of the sub-groups of data and a particular second value corresponding only to the selected subset of data, and wherein the particular second value is a hash value identifying the selected subset of data; and selectively providing a response to either one of the first request or the second request. 10. The method of claim 9 , wherein the first value identifies a tenant associated with the storage cluster and the second values each identify a respective user within the tenant. 11. The method of claim 10 , wherein each of the second values correspond to a hashed username of the respective user. 12. The method of claim 9 , wherein each of the plurality of keys corresponds to x|h(y), wherein x is the first value, h(y) is the second value, y is a username associated with one of the subsets of data, and h(y) is a hash value resulting from applying a hash function to the username. 13. The method of claim 9 , further comprising storing a first subset of data within a first sub-group of data in a first one of the key-value storage nodes and storing a second subset of data within the first sub-group of data in a second one of the key-value storage nodes. 14. The method of claim 9 , wherein generating the first request to retrieve the selected first one of the sub-groups of data includes generating the plurality of keys such that each key in the plurality of keys includes a different second value included among the plurality of corresponding second values. 15. The method of claim 14 , wherein the plurality of keys include p of the plurality of corresponding second values, wherein p is an integer greater than one, and wherein p corresponds to a hash range. 16. The method of claim 15 , wherein p is less than a quantity of users associated with a respective one of the sub-groups of data. 17. A key-value processing system for a key-value server including a plurality of key-value storage nodes, the key-value processing system being configured to evenly distribute a total amount of data across the plurality of key-value storage nodes wit

Assignees

Inventors

Classifications

  • Distributed file systems · CPC title

  • G06F16/137Primary

    Hash-based (content-based indexing of textual data G06F16/31) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10452610B2 cover?
A storage cluster includes a plurality of key-value storage nodes categorized into sub-groups of data associated with a first value identifying the sub-group and second values identifying respective subsets of data. A key-value processing system receives at least one of a first request to retrieve a selected one of the sub-groups of data, the first request including a plurality of keys, each of…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/137. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).