Methods and systems for calculating statistical quantities in a computing environment

US9830188B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9830188-B2
Application numberUS-201414303449-A
CountryUS
Kind codeB2
Filing dateJun 12, 2014
Priority dateJun 12, 2014
Publication dateNov 28, 2017
Grant dateNov 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure is directed to methods and systems for calculating statistical quantities of computational resources used by distributed data sources in a computing environment. In one aspect, a master node receives a query regarding use of computational resources used by distributed data sources of a computing environment. The data sources generate metric data that represents use of the computational resources and distribute the metric data to two or more worker nodes. The master node directs each worker node to generate worker-node data that represents the metric data received by each of the worker nodes and each worker node sends worker-node data to the master node. The master node receives the worker-node data and calculates a master-data structure based on the worker-node data, which may be used to estimate percentiles of the metric data in response to the query.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method stored in one or more data-storage devices and executed using one or more processors of a computing environment, the method comprising: receiving a query at a master node about use of computational resources of the computing environment; distributing metric data generated by data sources in the computing environment to one or more worker nodes, the metric data represents use of the computational resources; each worker node performs the following: receiving the query from the master node, generating worker-node data that represents the metric data received by the worker node in response to the query, and sending the worker-node data to the master node; and calculating at the master node a master-data structure that represents a distribution of the metric data over data-structure intervals based on the worker-node data sent from the worker nodes. 2. The method of claim 1 further comprises calculating estimated percentiles from the master-data structure based on the query. 3. The method of claim 1 , wherein distributing the metric data further comprises: generating the metric data at the data sources of the computing environment that use the computational resources; partitioning the metric data into two or more unique subsets of the metric data; and sending a unique subset of the metric data to each of the two or more worker nodes. 4. The method of claim 1 , wherein generating the worker-node data that represents the metric data received by the worker node further comprises when data size of the metric data received by the worker node is less than or equal to a memory bound of the worker node, the worker-node data is an array of the metric data. 5. The method of claim 1 , wherein generating the worker-node data that represents the metric data received by the worker node further comprises when the data size of the metric data is greater than the memory bound of the worker node, generating a data structure that represents a distribution of the metric data received by the worker, the worker-node data is the data structure. 6. The method of claim 5 , wherein generating the data structure that represents the distribution of the metric data further comprises: estimating a minimum-metric value and a maximum-metric value for the subset of metric data received by the worker node; forming data-structure intervals that combined cover values of the metric data; identifying data-structure intervals the estimated minimum-metric value and the estimated maximum-metric value are in; calculating an interval degree D based on the number of data-structure intervals and a number of intervals between and including the data-structure intervals that contain the estimated minimum-metric value and the estimated maximum-metric value; and counting each metric value of the metric data that lies in the data-structure intervals to form a frequency distribution the metric data over the data-structure intervals. 7. The method of claim 6 , further comprises: when the interval degree D is greater than or equal to one, splitting each of the data-structure intervals into 2 D data-structure subintervals; and counting each metric value of the metric data that lies in the data-structure subintervals to form a frequency distribution the metric data over the data-structure subintervals. 8. The method of claim 1 , wherein calculating the master-data structure further comprises: initializing current data as first worker-node data received; and for each worker-node data received after the first worker-node data, combining the worker-node data with the current data to update current data, the current data being the master-data structure when worker nodes have finished. 9. The method of claim 1 , wherein combining the worker-node data with the current data to update the current data further comprises: when current data and the worker-node data are metric data, combining current data and worker-node data to update current data; when current data is metric data and the worker-node data is a data structure, converting the current data to a data structure and aggregating the current data with the worker-node data; when current data is a data structure and the worker-node data is metric data, converting the worker-node data to a data structure and aggregating the current data with the worker-node data; and when current data and the worker-node data are data structures, aggregating the current data with the worker-node data. 10. The method of claim 1 , wherein a single worker node operates at the master node when a volume of the metric data output from the data sources is below a threshold. 11. A system for generating a data structure of metric data generated in a computing environment comprising: one or more processors; one or more data-storage devices; and a routine stored in the data-storage devices and executed using the one or more processors, the routine receiving a query at a master node about use of computational resources of the computing environment; distributing metric data generated by data sources in the computing environment to one or more worker nodes, the metric data represents use of the computational resources; each worker node performs the following: receiving the query from the master node, generating worker-node data that represents the metric data received by the worker node in response to the query, and sending the worker-node data to the master node; and calculating at the master node a master-data structure that represents a distribution of the metric data over data-structure intervals based on the worker-node data sent from the worker nodes. 12. The system of claim 11 further comprises calculating estimated percentiles from the master-data structure based on the query. 13. The system of claim 11 , wherein distributing the metric data further comprises: generating the metric data at data sources of the computing environment that use the computational resources; partitioning the metric data into two or more unique subsets of the metric data; and sending a unique subset of the metric data to each of the two or more worker nodes. 14. The system of claim 11 , wherein generating the worker-node data that represents the metric data received by the worker node further comprises when data size of the metric data received by the worker node is less than or equal to a memory bound of the worker node, the worker-node data is an array of the metric data. 15. The system of claim 11 , wherein generating the worker-node data that represents the metric data received by the worker node further comprises when the data size of the metric data is greater than the memory bound of the worker node, generating a data structure that represents a distribution of the metric data received by the worker, the worker-node data is the data structure. 16. The system of claim 15 , wherein generating the data structure that represents the distribution of the metric data further comprises: estimating a minimum-metric value and a maximum-metric value for the subset of metric data received by the worker node; forming data-structure intervals that combined cover values of the metric data; identifying data-structure intervals the estimated minimum-metric value and the estimated maximum-metric value are in; calculating an interval degree D based on the number of data-structure intervals and a number of intervals between and including the data-structure intervals that contain the estimated minimum-metric value and the estimated maximum-metric value; and counting ea

Assignees

Inventors

Classifications

  • Performance evaluation by statistical analysis · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting · CPC title

  • Monitoring · CPC title

  • for performance assessment · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9830188B2 cover?
This disclosure is directed to methods and systems for calculating statistical quantities of computational resources used by distributed data sources in a computing environment. In one aspect, a master node receives a query regarding use of computational resources used by distributed data sources of a computing environment. The data sources generate metric data that represents use of the comput…
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/5011. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).