Methods and apparatus to estimate cardinality of users represented in arbitrarily distributed bloom filters

US12387227B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12387227-B2
Application numberUS-202318298814-A
CountryUS
Kind codeB2
Filing dateApr 11, 2023
Priority dateFeb 11, 2020
Publication dateAug 12, 2025
Grant dateAug 12, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, apparatus, systems, and articles of manufacture to estimate cardinality of users represented in arbitrarily distributed bloom filter arrays are disclosed. A system includes a communication interface to: access a first Bloom filter array representative of first entries in a first database, the first entries allocated to ones of first elements in the first Bloom filter array based on a non-uniform distribution of outputs of a hash function applied to the first entries, and access a second Bloom filter array representative of second entries in a second database. The system also includes machine readable instructions to cause one or more processors to estimate a cardinality of a union of the first and second entries based on the non-uniform distribution of the outputs of the hash function.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a communication interface to: access a first Bloom filter array generated by a first computer of a first database proprietor, the first Bloom filter array representative of first entries in a first database of the first database proprietor, the first entries allocated to ones of first elements in the first Bloom filter array based on a non-uniform distribution of outputs of a hash function applied to the first entries, and access a second Bloom filter array generated by a second computer of a second database proprietor, the second Bloom filter array representative of second entries in a second database of the second database proprietor, the second entries allocated to ones of second elements in the second Bloom filter array based on the non-uniform distribution of the outputs of the hash function applied to the second entries; one or more processors; and machine readable instructions to cause the one or more processors to estimate a cardinality of a union of the first and second entries based on the non-uniform distribution of the outputs of the hash function, wherein the non-uniform distribution is a geometric distribution in which a leftmost bit of a Bloom filter array has a highest probability of mapping a hash function output, with the probability decreasing for subsequent bits of the Bloom filter array, such that the Bloom filter array represents at least a same amount of data with a smaller array length as compared to a length of traditional Bloom filter array that is populated using a uniform distribution, and a likelihood of the Bloom filter array becoming saturated is reduced as compared to a likelihood of the traditional Bloom filter array becoming saturated. 2. The system of claim 1 , wherein the first and second entries correspond to users who accessed media. 3. The system of claim 1 , wherein the one or more processors are to estimate the cardinality by causing a numerical solver to solve for a number of entries that maximizes a likelihood of producing the union of the first and second entries. 4. The system of claim 1 , wherein the estimate of the cardinality has an error, for a given amount of noise in ones of the first and second Bloom filter arrays, that has an absolute value that varies by less than 1% across a range of different values of a ratio of the cardinality to a length of the first and second Bloom filter arrays, the different values ranging from 0.125 to 8. 5. The system of claim 1 , wherein the cardinality is a first cardinality and the union is a first union, the one or more processors to estimate a second cardinality of a second union of entries in the first and second Bloom filter arrays and at least one other Bloom filter array. 6. A method comprising: accessing a first Bloom filter array generated by a first computer of a first database proprietor, the first Bloom filter array representative of first entries in a first database of the first database proprietor, the first entries allocated to ones of first elements in the first Bloom filter array based on a non-uniform distribution of outputs of a hash function applied to the first entries; accessing a second Bloom filter array generated by a second computer of a second database proprietor, the second Bloom filter array representative of second entries in a second database of the second database proprietor, the second entries allocated to ones of second elements in the second Bloom filter array based on the non-uniform distribution of the outputs of the hash function applied to the second entries; and estimating a cardinality of a union of the first and second entries based on the non-uniform distribution of the outputs of the hash function, wherein the non-uniform distribution is a geometric distribution in which a leftmost bit of a Bloom filter array has a highest probability of mapping a hash function output, with the probability decreasing for subsequent bits of the Bloom filter array, such that the Bloom filter array represents at least a same amount of data with a smaller array length as compared to a length of traditional Bloom filter array that is populated using a uniform distribution, and a likelihood of the Bloom filter array becoming saturated is reduced as compared to a likelihood of the traditional Bloom filter array becoming saturated. 7. The method of claim 6 , wherein the first and second entries correspond to users who accessed media. 8. The method of claim 6 , further including estimating the cardinality by causing a numerical solver to solve for a few entries that maximizes a likelihood of producing the union of the first and second entries. 9. The method of claim 6 , wherein the estimate of the cardinality has an error that remains substantially consistent, for a given amount of noise in ones of the first and second Bloom filter arrays, across a range of different values of a ratio of the cardinality to a length of the first and second Bloom filter arrays, the different values ranging from 0.125 to 8. 10. The method of claim 6 , wherein the cardinality is a first cardinality, and the union is a first union, the method further including estimating a second cardinality of a second union of entries in the first and second Bloom filter arrays and at least one other Bloom filter array. 11. A system comprising: a communication interface to: access a first Bloom filter array generated by a first computer of a first database proprietor, the first Bloom filter array representative of first users who accessed media, the first users allocated to ones of first elements in the first Bloom filter array based on a non-uniform distribution of outputs of a hash function applied to the first users, and access a second Bloom filter array generated by a second computer of a second database proprietor, the second Bloom filter array representative of second users who accessed media, the second users allocated to ones of second elements in the second Bloom filter array based on the non-uniform distribution of the outputs of the hash function applied to the second users; one or more processors; and machine readable instructions to cause the one or more processors to estimate a cardinality of a union of the first and second users based on the non-uniform distribution of the outputs of the hash function, wherein the non-uniform distribution is a geometric distribution in which a leftmost bit of a Bloom filter array has a highest probability of mapping a hash function output, with the probability decreasing for subsequent bits of the Bloom filter array, such that the Bloom filter array represents at least a same amount of data with a smaller array length as compared to a length of traditional Bloom filter array that is populated using a uniform distribution, and a likelihood of the Bloom filter array becoming saturated is reduced as compared to a likelihood of the traditional Bloom filter array becoming saturated. 12. The system of claim 11 , wherein a smallest one of the different sized proportions is greater than or equal to a threshold defined based on a universe estimate of a population of possible audience members of the media. 13. The system of claim 11 , wherein the one or more processors are to estimate the cardinality by causing a numerical solver to solve for several users that maximizes a likelihood of producing the union of the first and second users. 14. The system of claim 11 , wherein the estimate of the cardinality has an error, for a given amount of noise in ones of the first and second Bloom filter arrays, that has an absolute value that varies by less than 1% across a range of different values

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Market modelling; Market analysis; Collecting market data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12387227B2 cover?
Methods, apparatus, systems, and articles of manufacture to estimate cardinality of users represented in arbitrarily distributed bloom filter arrays are disclosed. A system includes a communication interface to: access a first Bloom filter array representative of first entries in a first database, the first entries allocated to ones of first elements in the first Bloom filter array based on a n…
Who is the assignee on this patent?
Nielsen Co Us Llc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0201. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).