Throughput-based fan-out control in scalable distributed data stores

US10037376B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10037376-B2
Application numberUS-201615096067-A
CountryUS
Kind codeB2
Filing dateApr 11, 2016
Priority dateMar 11, 2016
Publication dateJul 31, 2018
Grant dateJul 31, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed embodiments provide a system for processing data. During operation, the system determines a current incoming queries per second (QPS) to one or more components for processing queries of a graph database, wherein the graph database is replicated across multiple clusters and distributed among a set of storage nodes in each of the clusters. Next, the system uses the current incoming QPS to estimate, for the one or more components, an expected QPS associated with fanning out of the queries to the clusters. The system then selects a number of clusters in the multiple clusters for fanning out of a query based on the expected QPS and one or more throughput limits for the one or more components. Finally, the system transmits the query to one or more of the storage nodes in the selected number of clusters.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: determining a current incoming queries per second (QPS) to one or more components for processing queries of a graph database, wherein the graph database is replicated across multiple clusters and distributed among a set of storage nodes in each of the clusters; using the current incoming QPS to estimate, for the one or more components, an expected QPS associated with fanning out of the queries to the clusters; and when a query of the graph database is received, processing the query on a computer system by: selecting a number of clusters in the multiple clusters for fanning out of the query, based on the expected QPS and one or more throughput limits for the one or more components; and transmitting the query to one or more storage nodes in the selected number of clusters. 2. The method of claim 1 , wherein the one or more components comprise: a client node that receives incoming queries to the graph database; and a storage node in the set of storage nodes. 3. The method of claim 2 , wherein using the current incoming QPS to estimate the expected QPS for the client node comprises: multiplying the current incoming QPS to the client node by a candidate number of clusters for fanning out of the query and a number of storage nodes in each of the clusters to obtain an expected outgoing QPS from the client node. 4. The method of claim 3 , wherein selecting the number of clusters for fanning out of the query based on the expected QPS and the one or more throughput limits comprises: selecting the number of clusters to not exceed a product of a throughput limit of the client node and the candidate number of clusters divided by the expected outgoing QPS from the client node. 5. The method of claim 2 , wherein using the current incoming QPS to estimate the expected QPS for the storage node comprises: multiplying the current incoming QPS to the client node by a number of instances of the client node to obtain an expected incoming QPS to the storage node. 6. The method of claim 5 , wherein selecting the number of clusters for fanning out of the query based on the expected QPS and the one or more throughput limits comprises: selecting the number of clusters to not exceed a product of a throughput limit of the storage node and a candidate number of clusters for fanning out of the query divided by a product of the number of instances of the client node and the current incoming QPS to the client node. 7. The method of claim 2 , wherein the client node comprises at least one of: a query processor; and a caching service. 8. The method of claim 1 , wherein selecting the number of clusters for fanning out of the query based on the expected QPS and the one or more throughput limits comprises: limiting the selected number of clusters to a value that does not cause the expected QPS to exceed the one or more throughput limits. 9. The method of claim 1 , further comprising: selecting the number of clusters for fanning out of the query based on a query type of the query. 10. The method of claim 1 , wherein the graph database comprises a set of partitions in a first distribution across the storage nodes in a first cluster and a second distribution that is different from the first distribution across the storage nodes in a second cluster. 11. The method of claim 1 , wherein transmitting the query to the one or more storage nodes in the selected number of clusters comprises: randomly selecting the determined number of clusters as a subset of the multiple clusters; identifying a partition of the graph database storing data associated with a key in the query; selecting a cluster from the subset of clusters; identifying a storage node containing the partition in the selected cluster; and transmitting a portion of the query comprising the key to the identified storage node. 12. An apparatus, comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to: determine a current incoming queries per second (QPS) to one or more components for processing queries of a graph database, wherein the graph database is replicated across multiple clusters and distributed among a set of storage nodes in each of the clusters; use the current incoming QPS to estimate, for the one or more components, an expected QPS associated with fanning out of the queries to the clusters; select a number of clusters in the multiple clusters for fanning out of a query based on the expected QPS and one or more throughput limits for the one or more components; and transmit the query to one or more of the storage nodes in the selected number of clusters. 13. The apparatus of claim 12 , wherein the one or more components comprise: a client node that receives incoming queries to the graph database; and a storage node in the set of storage nodes. 14. The apparatus of claim 13 , wherein using the current incoming QPS to estimate the expected QPS for the client node comprises: multiplying the current incoming QPS to the client node by a candidate number of clusters for fanning out of the query and a number of storage nodes in each of the clusters to obtain an expected outgoing QPS from the client node. 15. The apparatus of claim 14 , wherein selecting the number of clusters for fanning out of the query based on the expected QPS and the one or more throughput limits comprises: selecting the number of clusters to not exceed a product of a throughput limit of the client node and the candidate number of clusters divided by the expected outgoing QPS from the client node. 16. The apparatus of claim 13 , wherein using the current incoming QPS to estimate the expected QPS for the storage node comprises: multiplying the current incoming QPS to the client node by a number of instances of the client node to obtain an expected incoming QPS to the storage node. 17. The apparatus of claim 16 , wherein selecting the number of clusters for fanning out of the query based on the expected QPS and the one or more throughput limits comprises: selecting the number of clusters to not exceed a product of a throughput limit of the storage node and a candidate number of clusters for fanning out of the query divided by a product of the number of instances of the client node and the current incoming QPS to the client node. 18. The apparatus of claim 12 , wherein selecting the number of clusters for fanning out of the query based on the expected QPS and the one or more throughput limits comprises: limiting the selected number of clusters to a value that does not cause the expected QPS to exceed the one or more throughput limits. 19. A system, comprising: a measurement mechanism comprising a non-transitory computer-readable medium comprising instructions that, when executed, cause the system to determine a current incoming queries per second (QPS) to one or more components for processing queries of a graph database, wherein the graph database is replicated across multiple clusters and distributed among a set of storage nodes in each of the clusters; and a client node comprising a non-transitory computer-readable medium comprising instructions that, when executed, cause the system to: use the current incoming QPS to estimate, for the one or more components, an expected QPS associated with fanning out of the queries to the clusters; select a number of clusters in the multiple clusters for fanning out of a query based on the expected QPS and one or more throughput

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10037376B2 cover?
The disclosed embodiments provide a system for processing data. During operation, the system determines a current incoming queries per second (QPS) to one or more components for processing queries of a graph database, wherein the graph database is replicated across multiple clusters and distributed among a set of storage nodes in each of the clusters. Next, the system uses the current incoming …
Who is the assignee on this patent?
Linkedln Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30598. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 31 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).