Systems and methods for cross media reporting by fast merging of data sources
US-2022091873-A1 · Mar 24, 2022 · US
US11487668B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11487668-B2 |
| Application number | US-202117224005-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 6, 2021 |
| Priority date | Apr 6, 2021 |
| Publication date | Nov 1, 2022 |
| Grant date | Nov 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described are methods and systems for improved cardinality estimation. A method may include obtaining a data-query, obtaining a row, generating a hash value, determining a cardinality of leading zeros in the hash value, identifying a bucket with respect to the hash value, including a bucket identifier and the cardinality of leading zeros in a representation, determining the approximate unique count, and outputting the approximate unique count as results data responsive to the portion of the data-query.
Opening claim text (preview).
What is claimed is: 1. A method comprising: in response to receiving data expressing a usage intent with respect to a low-latency database analysis system: obtaining, at a distributed database, a portion of a data-query responsive to the data expressing the usage intent, wherein the portion of the data-query indicates: an approximate unique count clause with respect to a column from a table from the distributed database; determining an approximate unique count for the column by: for an unevaluated row: generating a hash value for a value of the column from the unevaluated row, wherein the hash value is in a defined range of hash values; determining a cardinality of leading zeros in the hash value; identifying a bucket with respect to the hash value, wherein identifying the bucket includes identifying the bucket from a plurality of buckets corresponding to the defined range of hash values, wherein the buckets from the plurality of buckets correspond with respective non-overlapping portions of the defined range of hash values, such that the hash value is in the portion of the defined range of hash values corresponding to the bucket; and appending to an unsorted sparse representation: a bucket identifier for the bucket; and the cardinality of the leading zeros; in response to a determination that the table omits unevaluated rows, determining the approximate unique count for the column using the unsorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the unsorted sparse representation; and outputting the approximate unique count as results data responsive to the portion of the data-query. 2. The method of claim 1 , wherein the appending further comprises: in response to a determination that utilization for a current memory allocation for the unsorted sparse representation is above a defined threshold: obtaining an expanded memory allocation for the unsorted sparse representation such that the expanded memory allocation is a multiple of the current memory allocation; and storing the unsorted sparse representation in the expanded memory allocation. 3. The method of claim 1 , further comprising: in response to a determination that a memory allocation for the unsorted sparse representation is greater than or equal to a conversion threshold, converting the unsorted sparse representation to a dense representation. 4. The method of claim 3 , wherein the dense representation includes memory allocated for each bucket. 5. The method of claim 3 , further comprising: in response to conversion of the unsorted sparse representation to the dense representation, the determining the approximate unique count for the column using the unsorted sparse representation uses the dense representation in lieu of the unsorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the dense representation. 6. The method of claim 1 , further comprising: determining if the hash value is in the unsorted sparse representation using the bucket identifier as an index to a map; updating a current cardinality of leading zeros for a matched bucket identifier; and adding the bucket identifier as a new index in the map for an unmatched bucket identifier. 7. The method of claim 6 , further comprising: in response to a determination that a memory allocation for the unsorted sparse representation and the map is greater than or equal to a conversion threshold, converting the unsorted sparse representation to a dense representation. 8. The method of claim 1 , further comprising: partitioning the table into a plurality of regions, wherein a region includes a non-overlapping set of rows from the table and is associated with a database instance, wherein processing is done in parallel for each region; and in response to a determination that the table omits unevaluated rows, merging unsorted sparse representations. 9. The method of claim 8 , wherein unsorted sparse representation is done in parallel for each region. 10. The method of claim 1 , wherein the unsorted sparse representation omits data corresponding to at least one bucket. 11. A system, comprising: a processor, and a memory, wherein the memory stores instructions executable by the processor to: in response to receiving data expressing a usage intent: obtain a portion of a data-query responsive to the data expressing the usage intent, wherein the portion of the data-query indicates: an approximate unique count clause with respect to a column from a table from a distributed database; determine an approximate unique count for the column by: for an unevaluated row: generate a hash value for a value of the column from the unevaluated row, wherein the hash value is in a defined range of hash values; determine a cardinality of leading zeros in the hash value; identify a bucket with respect to the hash value, wherein identifying the bucket includes identifying the bucket from a plurality of buckets corresponding to the defined range of hash values, wherein the buckets from the plurality of buckets correspond with respective non-overlapping portions of the defined range of hash values, such that the hash value is in the portion of the defined range of hash values corresponding to the bucket; update for a bucket identifier which matches an existing bucket identifier in a sorted sparse representation: a value of the cardinality of the leading zeros; positionally insert for an unmatched bucket identifier in the sorted sparse representation: a bucket identifier for the bucket; and the cardinality of the leading zeros; and in response to a determination that the table omits unevaluated rows, determine the approximate unique count for the column using the sorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the sorted sparse representation; and output the approximate unique count as results data responsive to the portion of the data-query. 12. The system of claim 11 , wherein the memory stores instructions executable by the processor to: in response to a determination that utilization for a current memory allocation for the sorted sparse representation is above a defined threshold: obtain an expanded memory allocation for the sorted sparse representation such that the expanded memory allocation is a multiple of the current memory allocation; and store the sorted sparse representation in the expanded memory allocation. 13. The system of claim 11 , wherein the memory stores instructions executable by the processor to: in response to a determination that a memory allocation for the sorted sparse representation is greater than or equal to a conversion threshold, convert the sorted sparse representation to a dense representation. 14. The system of claim 13 , wherein the memory stores instructions executable by the processor to: in response to conversion of the sorted sparse representation to the dense representation, the approximate unique count for the column uses the dense representation in lieu of the sorted sparse representation, wherein the approximate unique count is computed using the cardinalities of leading zeros in the dense representation. 15. The system of claim 11 , wherein the memory stores instructions executable by the processor to: determine a position of the unmatched bucket identifier in the sorted sparse representation. 16. The system of claim 11 , wherein the memory stores instructions executable b
Free address space management · CPC title
using pseudo-associative means, e.g. set-associative or hashing · CPC title
the resource being the memory · CPC title
Design, administration or maintenance of databases · CPC title
hash tables · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.