Approximate order statistics of real numbers in generic data

US9645975B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9645975-B2
Application numberUS-201414255981-A
CountryUS
Kind codeB2
Filing dateApr 18, 2014
Priority dateMar 1, 2011
Publication dateMay 9, 2017
Grant dateMay 9, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system, and processor-readable storage medium are directed towards calculating approximate order statistics on a collection of real numbers. In one embodiment, the collection of real numbers is processed to create a digest comprising hierarchy of buckets. Each bucket is assigned a real number N having P digits of precision and ordinality O. The hierarchy is defined by grouping buckets into levels, where each level contains all buckets of a given ordinality. Each individual bucket in the hierarchy defines a range of numbers—all numbers that, after being truncated to that bucket's P digits of precision, are equal to that bucket's N. Each bucket additionally maintains a count of how many numbers have fallen within that bucket's range. Approximate order statistics may then be calculated by traversing the hierarchy and performing an operation on some or all of the ranges and counts associated with each bucket.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for computing an order statistic for a particular number in a set of numbers, the method comprising: determining an ordinality for each number in the set of numbers by performing a computation comprising at least: determining a representation for each number in scientific notation, wherein the representation includes a mantissa and an exponent; and computing an ordinality for each number by subtracting from the exponent a count of significant digits, including significant zeros, in the mantissa that appear to the right of any decimal point in the mantissa; creating, in at least one storage device, a digest comprising one or more buckets, wherein each bucket is associated with an ordinality, a range of numerical values, and a count of any numbers contained in the bucket; storing each number in a matching bucket of the digest using the determined ordinality of the number; computing by one or more computing devices the order statistic for the particular number by performing computations involving counts for buckets in the digest thereby improving the efficiency of the computations performed by the one or more computing devices. 2. The computer-implemented method of claim 1 , wherein the one or more buckets in the digest are hierarchically organized into one or more levels. 3. The computer-implemented method of claim 1 , wherein the one or more buckets in the digest are hierarchically organized into one or more levels; wherein each level in the digest contains all buckets of a given ordinality; and wherein storing the number into the digest includes, searching a matching level of the digest associated with the ordinality of the number for the matching bucket associated with a range containing the number, if the matching bucket is found, incrementing a count for the matching bucket, and if the matching bucket is not found, creating a new bucket for the number at the matching level. 4. The computer-implemented method of claim 1 , wherein the digest is hierarchically structured as a tree; and wherein the method further comprises compressing the digest by collapsing one or more child buckets into an associated parent bucket if a sum of the counts of the one or more child buckets and the parent bucket falls below a threshold. 5. The computer-implemented method of claim 1 , wherein the digest is hierarchically structured as a tree; wherein the method further comprises compressing the digest by collapsing one or more child buckets into an associated parent bucket if a sum of the counts of the one or more child buckets and the parent bucket falls below a threshold; and wherein collapsing the one or more child buckets into the associated parent bucket includes adding counts of the one or more child buckets to the count of the parent bucket and deleting the child buckets. 6. The computer-implemented method of claim 1 , wherein the method further comprises merging the digest with another digest. 7. The computer-implemented method of claim 1 , wherein each bucket in the digest is also associated with a real number and a number of digits of precision; and wherein a given bucket in the digest includes all numbers that when truncated to the number of digits of precision equal the real number. 8. The computer-implemented method of claim 1 , wherein each bucket in the digest is represented by a data structure that specifies an ordinality and a range for the bucket. 9. The computer-implemented method of claim 1 , wherein the digest comprises a data structure that is isomorphic to a set of buckets. 10. The computer-implemented method of claim 1 , wherein the order statistic includes an approximate percentile. 11. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for computing an order statistic for a particular number in a set of numbers, the method comprising: determining an ordinality for each number in the set of numbers by performing a computation comprising at least: determining a representation for each number in scientific notation, wherein the representation includes a mantissa and an exponent; and computing an ordinality for each number by subtracting from the exponent a count of significant digits, including significant zeros, in the mantissa that appear to the right of any decimal point in the mantissa; creating, in at least one storage device, a digest comprising one or more buckets, wherein each bucket is associated with an ordinality, a range of numerical values, and a count of any numbers contained in the bucket; and storing each number in a matching bucket of the digest using the determined ordinality of the number; computing by one or more computing devices the order statistic for the particular number by performing computations involving counts for buckets in the digest thereby improving the efficiency of the computations performed by the one or more computing devices. 12. The non-transitory computer-readable storage medium of claim 11 , wherein the one or more buckets in the digest are hierarchically organized into one or more levels. 13. The non-transitory computer-readable storage medium of claim 11 , wherein the one or more buckets in the digest are hierarchically organized into one or more levels; wherein each level in the digest contains all buckets of a given ordinality; and wherein storing the number into the digest includes, searching a matching level of the digest associated with the ordinality of the number for the matching bucket associated with a range containing the number, if the matching bucket is found, incrementing a count for the matching bucket, and if the matching bucket is not found, creating a new bucket for the number at the matching level. 14. The non-transitory computer-readable storage medium of claim 11 , wherein the digest is hierarchically structured as a tree; and wherein the method further comprises compressing the digest by collapsing one or more child buckets into an associated parent bucket if a sum of the counts of the one or more child buckets and the parent bucket falls below a threshold. 15. The non-transitory computer-readable storage medium of claim 11 , wherein the digest is hierarchically structured as a tree; wherein the method further comprises compressing the digest by collapsing one or more child buckets into an associated parent bucket if a sum of the counts of the one or more child buckets and the parent bucket falls below a threshold; and wherein collapsing the one or more child buckets into the associated parent bucket includes adding counts of the one or more child buckets to the count of the parent bucket and deleting the child buckets. 16. The non-transitory computer-readable storage medium of claim 11 , wherein the method further comprises merging the digest with another digest. 17. The non-transitory computer-readable storage medium of claim 11 , wherein each bucket in the digest is also associated with a real number and a number of digits of precision; and wherein a given bucket in the digest includes all numbers that when truncated to the number of digits of precision equal the real number. 18. The non-transitory computer-readable storage medium of claim 11 , wherein each bucket in the digest is represented by a data structure that specifies an ordinality and a range for the bucket. 19. The non-transitory computer-readable storage medium of claim 11 , wherein the diges

Assignees

Inventors

Classifications

  • with adaptive number of clusters · CPC title

  • G06F17/18Primary

    for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • for evaluating functions by calculation {(G06F7/4824 takes precedence)} · CPC title

  • Approximate or statistical queries · CPC title

  • Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9645975B2 cover?
A method, system, and processor-readable storage medium are directed towards calculating approximate order statistics on a collection of real numbers. In one embodiment, the collection of real numbers is processed to create a digest comprising hierarchy of buckets. Each bucket is assigned a real number N having P digits of precision and ordinality O. The hierarchy is defined by grouping buckets…
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/18. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 09 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).