Distributed histogram computation framework using data stream sketches and samples
US-2021357403-A1 · Nov 18, 2021 · US
US12554693B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12554693-B2 |
| Application number | US-202519022160-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 15, 2025 |
| Priority date | Oct 28, 2022 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, apparatus, systems, and articles of manufacture to virtually estimate cardinality with global registers are disclosed. An example apparatus includes processor circuitry to s assign subsets of a sample dataset to a shared global register array, the shared global register array having a first number of registers, the sample dataset selected from a reference dataset of media assets; identify a virtual register array from the shared global register array that includes data elements associated with a label value, the virtual register array including a second number of registers less than the first number of registers; determine a maximum rank value of the label value across the virtual register array; and calculate a cardinality estimate of the label value across the virtual register array based on the second number of registers and the maximum rank value.
Opening claim text (preview).
The invention claimed is: 1 . A computing system comprising a processor and a memory, the computing system configured to perform a set of operations comprising: assigning subsets of a sample dataset to a shared global register array, the shared global register array having a first number of registers, the sample dataset selected from a reference dataset; identifying a virtual register array from the shared global register array that includes data elements associated with a label value, the virtual register array including a second number of registers less than the first number of registers; determining a maximum rank value of the label value across the virtual register array; and calculating a cardinality estimate of the label value across the virtual register array based on the second number of registers and the maximum rank value. 2 . The computing system of claim 1 , wherein the cardinality estimate is a first cardinality estimate, and the set of operations further comprises calculating a second cardinality estimate of the label value across the shared global register array based on the first cardinality estimate, the first number of registers, and the second number of registers. 3 . The computing system of claim 1 , wherein the set of operations further comprises generating a first rank distribution array for the shared global register array. 4 . The computing system of claim 3 , wherein the set of operations further comprises generating a second rank distribution array for the virtual register array. 5 . The computing system of claim 4 , wherein the set of operations further comprises generating an estimated recovered rank distribution array for the label value based on the first rank distribution array and the second rank distribution array. 6 . The computing system of claim 5 , wherein the set of operations further comprises determining an estimated cumulative distribution function for the label value based on the estimated recovered rank distribution array. 7 . The computing system of claim 6 , wherein the maximum rank value for the label value is based on the estimated cumulative distribution function. 8 . A non-transitory computer-readable medium having stored therein instructions that when executed by a computing system cause the computing system to perform a set of operations comprising: assigning subsets of a sample dataset to a shared global register array, the shared global register array having a first number of registers, the sample dataset selected from a reference dataset; identifying a virtual register array from the shared global register array that includes data elements associated with a label value, the virtual register array including a second number of registers less than the first number of registers; determining a maximum rank value of the label value across the virtual register array; and calculating a cardinality estimate of the label value across the virtual register array based on the second number of registers and the maximum rank value. 9 . The non-transitory computer-readable medium of claim 8 , wherein the cardinality estimate is a first cardinality estimate, and the set of operations further comprises calculating a second cardinality estimate of the label value across the shared global register array based on the first cardinality estimate, the first number of registers, and the second number of registers. 10 . The non-transitory computer-readable medium of claim 8 , wherein the set of operations further comprises generating a first rank distribution array for the shared global register array. 11 . The non-transitory computer-readable medium of claim 10 , wherein the set of operations further comprises generating a second rank distribution array for the virtual register array. 12 . The non-transitory computer-readable medium of claim 11 , wherein the set of operations further comprises generating an estimated recovered rank distribution array for the label value based on the first rank distribution array and the second rank distribution array. 13 . The non-transitory computer-readable medium of claim 12 , wherein the set of operations further comprises determining an estimated cumulative distribution function for the label value based on the estimated recovered rank distribution array. 14 . The non-transitory computer-readable medium of claim 13 , wherein the maximum rank value for the label value is based on the estimated cumulative distribution function. 15 . A method comprising: assigning subsets of a sample dataset to a shared global register array, the shared global register array having a first number of registers, the sample dataset selected from a reference dataset; identifying a virtual register array from the shared global register array that includes data elements associated with a label value, the virtual register array including a second number of registers less than the first number of registers; determining a maximum rank value of the label value across the virtual register array; and calculating a cardinality estimate of the label value across the virtual register array based on the second number of registers and the maximum rank value. 16 . The method of claim 15 , wherein the cardinality estimate is a first cardinality estimate, and the method further comprises calculating a second cardinality estimate of the label value across the shared global register array based on the first cardinality estimate, the first number of registers, and the second number of registers. 17 . The method of claim 15 , further comprising generating a first rank distribution array for the shared global register array. 18 . The method of claim 17 , further comprising generating a second rank distribution array for the virtual register array. 19 . The method of claim 18 , further comprising generating an estimated recovered rank distribution array for the label value based on the first rank distribution array and the second rank distribution array. 20 . The method of claim 19 , further comprising determining an estimated cumulative distribution function for the label value based on the estimated recovered rank distribution array.
Aggregation; Duplicate elimination · CPC title
Indexing structures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.