Using hierarchical reservoir sampling to compute percentiles at scale

US9756122B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9756122-B2
Application numberUS-201514664043-A
CountryUS
Kind codeB2
Filing dateMar 20, 2015
Priority dateMar 20, 2015
Publication dateSep 5, 2017
Grant dateSep 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, in a hierarchy of nodes, a master node having two or more child nodes obtains from the two or more child nodes two or more sets of data samples or summaries associated therewith, the two or more sets of data samples being representative of traffic processed via two or more sets of servers corresponding to the two or more child nodes, wherein a size of each of the two or more sets of data samples is proportional to an allocation of traffic among the two or more sets of servers corresponding to the two or more child nodes. Each of the two or more sets of data samples is obtained from a different one of the two or more child nodes and represents traffic processed by a corresponding one of the two or more sets of servers. The master node combines the two or more sets of data samples or summaries associated therewith such that a combined set of data is generated. The master node ascertains a numerical value from the combined set of data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: at a master node in a hierarchy of nodes, the master node having two or more child nodes, obtaining from the two or more child nodes two or more sets of data samples or summaries associated therewith, the two or more sets of data samples being representative of traffic processed via two or more sets of servers corresponding to the two or more child nodes, wherein a size of each of the two or more sets of data samples is proportional to an allocation of traffic among the two or more sets of servers corresponding to the two or more child nodes, wherein each of the two or more sets of data samples is obtained from a different one of the two or more child nodes and represents traffic processed by a corresponding one of the two or more sets of servers; at the master node, combining the two or more sets of data samples or summaries associated therewith such that a combined set of data is generated; and at the master node, ascertaining a numerical value from the combined set of data. 2. The method as recited in claim 1 , wherein ascertaining a numerical value from the combined set of data comprises ascertaining an N-th percentile from the combined set of data. 3. The method as recited in claim 1 , further comprising: transmitting, by the master node, a request to the two or more child nodes for data samples according to the allocation of the traffic among the two or more sets of servers corresponding to the two or more child nodes; wherein obtaining from the two or more child nodes two or more sets of data samples representative of traffic processed via the two or more sets of servers corresponding to the two or more child nodes includes receiving the two or more sets of data samples from the two or more child nodes in response to the request. 4. The method as recited in claim 1 , further comprising: at the master node, ascertaining the allocation of traffic among the two or more sets of servers corresponding to the two or more child nodes. 5. The method as recited in claim 4 , further comprising: at the master node, obtaining from the two or more child nodes, two or more total data counts representative of an amount of traffic processed via the two or more sets of servers corresponding to the two or more child nodes, wherein each of the two or more total data counts is obtained from a different one of the two or more child nodes; at the master node, ascertaining a total amount of traffic processed via the two or more sets of servers corresponding to the two or more child nodes based, at least in part, upon the two or more total data counts; wherein ascertaining the allocation of the traffic among the two or more sets of servers corresponding to the two or more child nodes is performed based, at least in part, upon the two or more total data counts. 6. The method as recited in claim 1 , wherein the two or more child nodes are associated with one or more data centers, locations, operating systems, carriers, sources of content, time periods, types of media content, subject matter categories of content, or languages in which content is provided. 7. The method as recited in claim 1 , wherein each of the two or more child nodes performs reservoir sampling to generate a corresponding one of the two or more sets of data samples. 8. The method as recited in claim 1 , wherein the two or more sets of servers comprise the two or more child nodes, wherein each one of the two or more child nodes performs sampling of data representative of the traffic processed via the one of the two or more child nodes such that a set of data samples is obtained in association with the corresponding one of the two or more child nodes. 9. A non-transitory computer-readable storage medium storing thereon computer-readable instructions, comprising: instructions for obtaining, from each child node of a master node in a hierarchy of nodes, a set of data samples or summary associated therewith, the set of data samples being representative of traffic processed via a set of servers corresponding to the child node, wherein a size of the set of data samples is proportional to a distribution of total traffic among the set of servers and other sets of servers corresponding to other child nodes of the master node; instructions for generating, at the master node, a combined set of data from the set of data samples or summary obtained from each child node of the master node; and instructions for ascertaining, at the master node, a numerical value from the combined set of data. 10. The non-transitory computer-readable storage medium as recited in claim 9 , wherein the numerical value comprises a sum, an N-th percentile, or an average. 11. The non-transitory computer-readable storage medium as recited in claim 9 , further comprising: instructions for transmitting, by the master node, a request to the two or more child nodes for data samples according to the distribution of total traffic among the two or more sets of servers corresponding to the two or more child nodes; wherein obtaining from the two or more child nodes two or more sets of data samples representative of traffic processed via the two or more sets of servers corresponding to the two or more child nodes includes receiving the two or more sets of data samples from the two or more child nodes in response to the request. 12. The non-transitory computer-readable storage medium as recited in claim 11 , further comprising: instructions for obtaining by the master node from the two or more child nodes, two or more total data counts representative of an amount of traffic processed via the two or more sets of servers corresponding to the two or more child nodes, wherein each of the two or more total data counts is obtained from a different one of the two or more child nodes; instructions for ascertaining, at the master node, a total amount of traffic processed via the two or more sets of servers corresponding to the two or more child nodes based, at least in part, upon the two or more total data counts; wherein ascertaining the allocation of the traffic among the two or more sets of servers corresponding to the two or more child nodes is performed based, at least in part, upon the two or more total data counts. 13. The non-transitory computer-readable storage medium as recited in claim 9 , further comprising: instructions for obtaining, by the master node from each of the two or more child nodes, an N-th percentile of a corresponding segment of the traffic; and instructions for sending a notification or modifying operations of at least one server of the two or more sets of servers corresponding to the two or more child nodes, wherein sending a notification or modifying operations is performed by the master node based, at least in part, upon the N-th percentile received from each of the two or more child nodes. 14. An apparatus, comprising: a processor; and a memory storing thereon computer-readable instructions, the computer-readable instructions being configured to: at a master node in a hierarchy of nodes, the master node having two or more child nodes, obtain from the two or more child nodes two or more sets of data samples or summaries associated therewith, the two or more sets of data samples being representative of traffic processed via two or more sets of servers corresponding to the two or more child nodes, wherein a size of each of the two or more sets of data samples is proportional to an allocation of traffic among the two or more sets of servers corresponding to the two or more child nodes, wherein each of the two or more sets of data samples is obtained from a

Assignees

Inventors

Classifications

  • Network utilisation, e.g. volume of load or congestion level · CPC title

  • by sampling · CPC title

  • H04L41/044Primary

    comprising hierarchical management structures · CPC title

  • using data related to the state of servers by a load balancer · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9756122B2 cover?
In one embodiment, in a hierarchy of nodes, a master node having two or more child nodes obtains from the two or more child nodes two or more sets of data samples or summaries associated therewith, the two or more sets of data samples being representative of traffic processed via two or more sets of servers corresponding to the two or more child nodes, wherein a size of each of the two or more …
Who is the assignee on this patent?
Yahoo Inc, Yahoo Holdings Inc
What technology area does this patent fall under?
Primary CPC classification H04L41/044. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Sep 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).