Approximate unique count

US11487668B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11487668-B2
Application numberUS-202117224005-A
CountryUS
Kind codeB2
Filing dateApr 6, 2021
Priority dateApr 6, 2021
Publication dateNov 1, 2022
Grant dateNov 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described are methods and systems for improved cardinality estimation. A method may include obtaining a data-query, obtaining a row, generating a hash value, determining a cardinality of leading zeros in the hash value, identifying a bucket with respect to the hash value, including a bucket identifier and the cardinality of leading zeros in a representation, determining the approximate unique count, and outputting the approximate unique count as results data responsive to the portion of the data-query.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: in response to receiving data expressing a usage intent with respect to a low-latency database analysis system: obtaining, at a distributed database, a portion of a data-query responsive to the data expressing the usage intent, wherein the portion of the data-query indicates: an approximate unique count clause with respect to a column from a table from the distributed database; determining an approximate unique count for the column by: for an unevaluated row: generating a hash value for a value of the column from the unevaluated row, wherein the hash value is in a defined range of hash values; determining a cardinality of leading zeros in the hash value; identifying a bucket with respect to the hash value, wherein identifying the bucket includes identifying the bucket from a plurality of buckets corresponding to the defined range of hash values, wherein the buckets from the plurality of buckets correspond with respective non-overlapping portions of the defined range of hash values, such that the hash value is in the portion of the defined range of hash values corresponding to the bucket; and appending to an unsorted sparse representation:  a bucket identifier for the bucket; and  the cardinality of the leading zeros; in response to a determination that the table omits unevaluated rows, determining the approximate unique count for the column using the unsorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the unsorted sparse representation; and outputting the approximate unique count as results data responsive to the portion of the data-query. 2. The method of claim 1 , wherein the appending further comprises: in response to a determination that utilization for a current memory allocation for the unsorted sparse representation is above a defined threshold: obtaining an expanded memory allocation for the unsorted sparse representation such that the expanded memory allocation is a multiple of the current memory allocation; and storing the unsorted sparse representation in the expanded memory allocation. 3. The method of claim 1 , further comprising: in response to a determination that a memory allocation for the unsorted sparse representation is greater than or equal to a conversion threshold, converting the unsorted sparse representation to a dense representation. 4. The method of claim 3 , wherein the dense representation includes memory allocated for each bucket. 5. The method of claim 3 , further comprising: in response to conversion of the unsorted sparse representation to the dense representation, the determining the approximate unique count for the column using the unsorted sparse representation uses the dense representation in lieu of the unsorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the dense representation. 6. The method of claim 1 , further comprising: determining if the hash value is in the unsorted sparse representation using the bucket identifier as an index to a map; updating a current cardinality of leading zeros for a matched bucket identifier; and adding the bucket identifier as a new index in the map for an unmatched bucket identifier. 7. The method of claim 6 , further comprising: in response to a determination that a memory allocation for the unsorted sparse representation and the map is greater than or equal to a conversion threshold, converting the unsorted sparse representation to a dense representation. 8. The method of claim 1 , further comprising: partitioning the table into a plurality of regions, wherein a region includes a non-overlapping set of rows from the table and is associated with a database instance, wherein processing is done in parallel for each region; and in response to a determination that the table omits unevaluated rows, merging unsorted sparse representations. 9. The method of claim 8 , wherein unsorted sparse representation is done in parallel for each region. 10. The method of claim 1 , wherein the unsorted sparse representation omits data corresponding to at least one bucket. 11. A system, comprising: a processor, and a memory, wherein the memory stores instructions executable by the processor to: in response to receiving data expressing a usage intent: obtain a portion of a data-query responsive to the data expressing the usage intent, wherein the portion of the data-query indicates: an approximate unique count clause with respect to a column from a table from a distributed database; determine an approximate unique count for the column by: for an unevaluated row:  generate a hash value for a value of the column from the unevaluated row, wherein the hash value is in a defined range of hash values;  determine a cardinality of leading zeros in the hash value;  identify a bucket with respect to the hash value, wherein identifying the bucket includes identifying the bucket from a plurality of buckets corresponding to the defined range of hash values, wherein the buckets from the plurality of buckets correspond with respective non-overlapping portions of the defined range of hash values, such that the hash value is in the portion of the defined range of hash values corresponding to the bucket;  update for a bucket identifier which matches an existing bucket identifier in a sorted sparse representation:  a value of the cardinality of the leading zeros;  positionally insert for an unmatched bucket identifier in the sorted sparse representation:  a bucket identifier for the bucket; and  the cardinality of the leading zeros; and in response to a determination that the table omits unevaluated rows, determine the approximate unique count for the column using the sorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the sorted sparse representation; and output the approximate unique count as results data responsive to the portion of the data-query. 12. The system of claim 11 , wherein the memory stores instructions executable by the processor to: in response to a determination that utilization for a current memory allocation for the sorted sparse representation is above a defined threshold: obtain an expanded memory allocation for the sorted sparse representation such that the expanded memory allocation is a multiple of the current memory allocation; and store the sorted sparse representation in the expanded memory allocation. 13. The system of claim 11 , wherein the memory stores instructions executable by the processor to: in response to a determination that a memory allocation for the sorted sparse representation is greater than or equal to a conversion threshold, convert the sorted sparse representation to a dense representation. 14. The system of claim 13 , wherein the memory stores instructions executable by the processor to: in response to conversion of the sorted sparse representation to the dense representation, the approximate unique count for the column uses the dense representation in lieu of the sorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the dense representation. 15. The system of claim 11 , wherein the memory stores instructions executable by the processor to: determine a position of the unmatched bucket identifier in the sorted sparse representation. 16. The system of claim 11 , wherein the memory stores instructions executable b

Assignees

Inventors

Classifications

  • Free address space management · CPC title

  • using pseudo-associative means, e.g. set-associative or hashing · CPC title

  • the resource being the memory · CPC title

  • Design, administration or maintenance of databases · CPC title

  • hash tables · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11487668B2 cover?
Described are methods and systems for improved cardinality estimation. A method may include obtaining a data-query, obtaining a row, generating a hash value, determining a cardinality of leading zeros in the hash value, identifying a bucket with respect to the hash value, including a bucket identifier and the cardinality of leading zeros in a representation, determining the approximate unique c…
Who is the assignee on this patent?
Thoughtspot Inc
What technology area does this patent fall under?
Primary CPC classification G06F12/0864. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).