Methods and apparatus to estimate cardinality of users represented in arbitrarily distributed bloom filters

US2025363509A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025363509-A1
Application numberUS-202519295063-A
CountryUS
Kind codeA1
Filing dateAug 8, 2025
Priority dateFeb 11, 2020
Publication dateNov 27, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, apparatus, systems, and articles of manufacture to estimate cardinality of users represented in arbitrarily distributed bloom filter arrays are disclosed. A system includes a communication interface to: access a first Bloom filter array representative of first entries in a first database, the first entries allocated to ones of first elements in the first Bloom filter array based on a non-uniform distribution of outputs of a hash function applied to the first entries, and access a second Bloom filter array representative of second entries in a second database. The system also includes machine readable instructions to cause one or more processors to estimate a cardinality of a union of the first and second entries based on the non-uniform distribution of the outputs of the hash function.

First claim

Opening claim text (preview).

1 . A computing system comprising a processor and a memory, the computing system configured to perform a set of acts comprising: accessing a first Bloom filter array generated by a first computing system of a first database proprietor, wherein the first Bloom filter array is representative of first entries in a first database of the first database proprietor, wherein the first entries are allocated to respective elements in the first Bloom filter array using a hash function, and wherein a mapping of an output of the hash function to an element in the first Bloom filter array is based on a non-uniform distribution across different elements in the first Bloom filter array; accessing a second Bloom filter array generated by a second computing system, wherein the second Bloom filter array is representative of second entries in a second database, wherein the second entries are allocated to respective elements in the second Bloom filter using the hash function, and wherein a mapping of an output of the hash function to an element in the second Bloom filter array is based on the non-uniform distribution; and estimating a cardinality of a union of the first and second entries based on the non-uniform distribution, wherein the first Bloom filter array represents at least a same amount of data with a smaller array length as compared to a length of a traditional Bloom filter array that is populated using a uniform distribution, because the non-uniform distribution reduces a likelihood of the first Bloom filter array becoming saturated as compared to a likelihood of the traditional Bloom filter array becoming saturated. 2 . The computing system of claim 1 , wherein the first and second entries correspond to users who accessed media, a smallest one of the different sized proportions being greater than or equal to a threshold defined based on a universe estimate of a population of possible audience members of the media. 3 . The computing system of claim 1 , wherein estimating the cardinality comprises causing a numerical solver to solve for a number of entries that maximizes a likelihood of producing the union of the first and second entries. 4 . The computing system of claim 1 , wherein the estimate of the cardinality has an error, for a given amount of noise in ones of the first and second Bloom filter arrays, that has an absolute value that varies by less than 1% across a range of different values of a ratio of the cardinality to a length of the first and second Bloom filter arrays, the different values ranging from 0.125 to 8. 5 . The computing system of claim 1 , wherein the cardinality is a first cardinality and the union is a first union, and wherein the set of acts further comprises estimating a second cardinality of a second union of entries in the first and second Bloom filter arrays and at least one other Bloom filter array. 6 . The computing system of claim 1 , wherein the non-uniform distribution is a geometric distribution. 7 . A method comprising: accessing, by a computing system, a first Bloom filter array generated by a first computing system of a first database proprietor, wherein the first Bloom filter array is representative of first entries in a first database of the first database proprietor, wherein the first entries are allocated to respective elements in the first Bloom filter array using a hash function, and wherein a mapping of an output of the hash function to an element in the first Bloom filter array is based on a non-uniform distribution across different elements in the first Bloom filter array; accessing, by the computing system, a second Bloom filter array generated by a second computing system, wherein the second Bloom filter array is representative of second entries in a second database, wherein the second entries are allocated to respective elements in the second Bloom filter using the hash function, and wherein a mapping of an output of the hash function to an element in the second Bloom filter array is based on the non-uniform distribution; and estimating, by the computing system, a cardinality of a union of the first and second entries based on the non-uniform distribution, wherein the first Bloom filter array represents at least a same amount of data with a smaller array length as compared to a length of a traditional Bloom filter array that is populated using a uniform distribution, because the non-uniform distribution reduces a likelihood of the first Bloom filter array becoming saturated as compared to a likelihood of the traditional Bloom filter array becoming saturated. 8 . The method of claim 7 , wherein the first and second entries correspond to users who accessed media, a smallest one of the different sized proportions being greater than or equal to a threshold defined based on a universe estimate of a population of possible audience members of the media. 9 . The method of claim 7 , wherein estimating the cardinality comprises causing a numerical solver to solve for a number of entries that maximizes a likelihood of producing the union of the first and second entries. 10 . The method of claim 7 , wherein the estimate of the cardinality has an error, for a given amount of noise in ones of the first and second Bloom filter arrays, that has an absolute value that varies by less than 1% across a range of different values of a ratio of the cardinality to a length of the first and second Bloom filter arrays, the different values ranging from 0.125 to 8. 11 . The method of claim 7 , wherein the cardinality is a first cardinality and the union is a first union, and wherein the method further comprises estimating a second cardinality of a second union of entries in the first and second Bloom filter arrays and at least one other Bloom filter array. 12 . The method of claim 7 , wherein the non-uniform distribution is a geometric distribution. 13 . A non-transitory computer-readable medium having stored thereon instructions that when executed by a computing system cause the computing system to perform a set of acts comprising: accessing a first Bloom filter array generated by a first computing system of a first database proprietor, wherein the first Bloom filter array is representative of first entries in a first database of the first database proprietor, wherein the first entries are allocated to respective elements in the first Bloom filter array using a hash function, and wherein a mapping of an output of the hash function to an element in the first Bloom filter array is based on a non-uniform distribution across different elements in the first Bloom filter array; accessing a second Bloom filter array generated by a second computing system, wherein the second Bloom filter array is representative of second entries in a second database, wherein the second entries are allocated to respective elements in the second Bloom filter using the hash function, and wherein a mapping of an output of the hash function to an element in the second Bloom filter array is based on the non-uniform distribution; and estimating a cardinality of a union of the first and second entries based on the non-uniform distribution, wherein the first Bloom filter array represents at least a same amount of data with a smaller array length as compared to a length of a traditional Bloom filter array that is populated using a uniform distribution, because the non-uniform distribution reduces a likelihood of the first Bloom filter array becoming saturated as compared to a likelihood of the traditional Bloom filter array becoming saturated. 14 . The non-transitory computer-readable medium of claim 13 , wherein the first and second entries corres

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Market modelling; Market analysis; Collecting market data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025363509A1 cover?
Methods, apparatus, systems, and articles of manufacture to estimate cardinality of users represented in arbitrarily distributed bloom filter arrays are disclosed. A system includes a communication interface to: access a first Bloom filter array representative of first entries in a first database, the first entries allocated to ones of first elements in the first Bloom filter array based on a n…
Who is the assignee on this patent?
Nielsen Co Us Llc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0201. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).