What technology area does this patent fall under?

Primary CPC classification G06F12/0864. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Approximate unique count

US11487668B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11487668-B2
Application number	US-202117224005-A
Country	US
Kind code	B2
Filing date	Apr 6, 2021
Priority date	Apr 6, 2021
Publication date	Nov 1, 2022
Grant date	Nov 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described are methods and systems for improved cardinality estimation. A method may include obtaining a data-query, obtaining a row, generating a hash value, determining a cardinality of leading zeros in the hash value, identifying a bucket with respect to the hash value, including a bucket identifier and the cardinality of leading zeros in a representation, determining the approximate unique count, and outputting the approximate unique count as results data responsive to the portion of the data-query.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: in response to receiving data expressing a usage intent with respect to a low-latency database analysis system: obtaining, at a distributed database, a portion of a data-query responsive to the data expressing the usage intent, wherein the portion of the data-query indicates: an approximate unique count clause with respect to a column from a table from the distributed database; determining an approximate unique count for the column by: for an unevaluated row: generating a hash value for a value of the column from the unevaluated row, wherein the hash value is in a defined range of hash values; determining a cardinality of leading zeros in the hash value; identifying a bucket with respect to the hash value, wherein identifying the bucket includes identifying the bucket from a plurality of buckets corresponding to the defined range of hash values, wherein the buckets from the plurality of buckets correspond with respective non-overlapping portions of the defined range of hash values, such that the hash value is in the portion of the defined range of hash values corresponding to the bucket; and appending to an unsorted sparse representation: a bucket identifier for the bucket; and the cardinality of the leading zeros; in response to a determination that the table omits unevaluated rows, determining the approximate unique count for the column using the unsorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the unsorted sparse representation; and outputting the approximate unique count as results data responsive to the portion of the data-query. 2. The method of claim 1 , wherein the appending further comprises: in response to a determination that utilization for a current memory allocation for the unsorted sparse representation is above a defined threshold: obtaining an expanded memory allocation for the unsorted sparse representation such that the expanded memory allocation is a multiple of the current memory allocation; and storing the unsorted sparse representation in the expanded memory allocation. 3. The method of claim 1 , further comprising: in response to a determination that a memory allocation for the unsorted sparse representation is greater than or equal to a conversion threshold, converting the unsorted sparse representation to a dense representation. 4. The method of claim 3 , wherein the dense representation includes memory allocated for each bucket. 5. The method of claim 3 , further comprising: in response to conversion of the unsorted sparse representation to the dense representation, the determining the approximate unique count for the column using the unsorted sparse representation uses the dense representation in lieu of the unsorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the dense representation. 6. The method of claim 1 , further comprising: determining if the hash value is in the unsorted sparse representation using the bucket identifier as an index to a map; updating a current cardinality of leading zeros for a matched bucket identifier; and adding the bucket identifier as a new index in the map for an unmatched bucket identifier. 7. The method of claim 6 , further comprising: in response to a determination that a memory allocation for the unsorted sparse representation and the map is greater than or equal to a conversion threshold, converting the unsorted sparse representation to a dense representation. 8. The method of claim 1 , further comprising: partitioning the table into a plurality of regions, wherein a region includes a non-overlapping set of rows from the table and is associated with a database instance, wherein processing is done in parallel for each region; and in response to a determination that the table omits unevaluated rows, merging unsorted sparse representations. 9. The method of claim 8 , wherein unsorted sparse representation is done in parallel for each region. 10. The method of claim 1 , wherein the unsorted sparse representation omits data corresponding to at least one bucket. 11. A system, comprising: a processor, and a memory, wherein the memory stores instructions executable by the processor to: in response to receiving data expressing a usage intent: obtain a portion of a data-query responsive to the data expressing the usage intent, wherein the portion of the data-query indicates: an approximate unique count clause with respect to a column from a table from a distributed database; determine an approximate unique count for the column by: for an unevaluated row: generate a hash value for a value of the column from the unevaluated row, wherein the hash value is in a defined range of hash values; determine a cardinality of leading zeros in the hash value; identify a bucket with respect to the hash value, wherein identifying the bucket includes identifying the bucket from a plurality of buckets corresponding to the defined range of hash values, wherein the buckets from the plurality of buckets correspond with respective non-overlapping portions of the defined range of hash values, such that the hash value is in the portion of the defined range of hash values corresponding to the bucket; update for a bucket identifier which matches an existing bucket identifier in a sorted sparse representation: a value of the cardinality of the leading zeros; positionally insert for an unmatched bucket identifier in the sorted sparse representation: a bucket identifier for the bucket; and the cardinality of the leading zeros; and in response to a determination that the table omits unevaluated rows, determine the approximate unique count for the column using the sorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the sorted sparse representation; and output the approximate unique count as results data responsive to the portion of the data-query. 12. The system of claim 11 , wherein the memory stores instructions executable by the processor to: in response to a determination that utilization for a current memory allocation for the sorted sparse representation is above a defined threshold: obtain an expanded memory allocation for the sorted sparse representation such that the expanded memory allocation is a multiple of the current memory allocation; and store the sorted sparse representation in the expanded memory allocation. 13. The system of claim 11 , wherein the memory stores instructions executable by the processor to: in response to a determination that a memory allocation for the sorted sparse representation is greater than or equal to a conversion threshold, convert the sorted sparse representation to a dense representation. 14. The system of claim 13 , wherein the memory stores instructions executable by the processor to: in response to conversion of the sorted sparse representation to the dense representation, the approximate unique count for the column uses the dense representation in lieu of the sorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the dense representation. 15. The system of claim 11 , wherein the memory stores instructions executable by the processor to: determine a position of the unmatched bucket identifier in the sorted sparse representation. 16. The system of claim 11 , wherein the memory stores instructions executable b

Assignees

Thoughtspot Inc

Inventors

Classifications

G06F12/023
Free address space management · CPC title
G06F12/0864Primary
using pseudo-associative means, e.g. set-associative or hashing · CPC title
G06F9/5016
the resource being the memory · CPC title
G06F16/21
Design, administration or maintenance of databases · CPC title
G06F16/9014
hash tables · CPC title

Patent family

Related publications grouped by family.

View patent family 80628492

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11487668B2 cover?: Described are methods and systems for improved cardinality estimation. A method may include obtaining a data-query, obtaining a row, generating a hash value, determining a cardinality of leading zeros in the hash value, identifying a bucket with respect to the hash value, including a bucket identifier and the cardinality of leading zeros in a representation, determining the approximate unique c…
Who is the assignee on this patent?: Thoughtspot Inc
What technology area does this patent fall under?: Primary CPC classification G06F12/0864. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for cross media reporting by fast merging of data sources

Distinct value estimation for query planning

Method And System To Estimate The Cardinality Of Sets And Set Operation Results From Single And Multiple HyperLogLog Sketches

Efficient determination of join paths via cardinality estimation

Frequently asked questions