Methods and apparatus to estimate total audience population distributions
US-2019147461-A1 · May 16, 2019 · US
US2025061101A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025061101-A1 |
| Application number | US-202418939035-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 6, 2024 |
| Priority date | Jul 5, 2019 |
| Publication date | Feb 20, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatus to estimate audience sizes using deduplication based on binomial sketch data are disclosed. An apparatus includes processor circuitry to instantiate coefficient analyzer circuitry to determine coefficient values of a polynomial based on (i) variances in values in the first sketch data and second sketch data, (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data. The processor circuitry to instantiate overlap analyzer circuitry to determine a real root of the polynomial, the real root corresponding to the quantity of the second subscribers that are duplicates of the first subscribers. The processor circuitry to instantiate report generator circuitry to estimate a deduplicated audience size based on the estimate of the quantity of the second subscribers that are duplicates of the first subscribers and the first and second cardinalities.
Opening claim text (preview).
1 . A computing system comprising a processor and a memory, the computing system configured to perform a set of acts comprising: receiving a first network communication from a server of a database proprietor, wherein the first network communication includes first sketch data representative of first individuals exposed to media; determining coefficient values of a polynomial based on (i) variances in the first sketch data and second sketch data (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data, wherein the second sketch data is representative of second individuals exposed to the media, wherein the first sketch data and the second sketch data are generated to maintain a privacy of the first individuals and the second individuals; determining a root of the polynomial, wherein the root is an estimate of an overlap between the first individuals and the second individuals; estimating a deduplicated audience size for the media based on the overlap, the first cardinality, and the second cardinality; and transmitting a second network communication to a third-party entity, wherein the second network communication includes a report based on the deduplicated audience size. 2 . The computing system of claim 1 , wherein the first individuals are subscribers of the database proprietor. 3 . The computing system of claim 2 , wherein the first individuals are subscribers of the database proprietor that are exposed to the media via a platform operated by the database proprietor. 4 . The computing system of claim 2 , wherein: the second individuals are subscribers of another database proprietor, the set of acts further comprises receiving a third network communication from a server of the other database proprietor, and the third network communication includes the second sketch data. 5 . The computing system of claim 1 , wherein the first sketch data and the second sketch data are generated using a Bernoulli hash. 6 . The computing system of claim 5 , wherein the first sketch data is generated by applying a Bernoulli hash to personally identifiable information of the first individuals. 7 . The computing system of claim 1 , wherein: the first sketch data is first binomial sketch data, and and the second sketch data is second binomial sketch data. 8 . A method comprising: receiving, by a computing system, a first network communication from a server of a database proprietor, wherein the first network communication includes first sketch data representative of first individuals exposed to media; determining, by the computing system, coefficient values of a polynomial based on (i) variances in the first sketch data and second sketch data (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data, wherein the second sketch data is representative of second individuals exposed to the media, wherein the first sketch data and the second sketch data are generated to maintain a privacy of the first individuals and the second individuals; determining, by the computing system, a root of the polynomial, wherein the root is an estimate of an overlap between the first individuals and the second individuals; estimating, by the computing system, a deduplicated audience size for the media based on the overlap, the first cardinality, and the second cardinality; and transmitting, by the computing system, a second network communication to a third-party entity, wherein the second network communication includes a report based on the deduplicated audience size. 9 . The method of claim 8 , wherein the first individuals are subscribers of the database proprietor. 10 . The method of claim 9 , wherein the first individuals are subscribers of the database proprietor that are exposed to the media via a platform operated by the database proprietor. 11 . The method of claim 9 , wherein: the second individuals are subscribers of another database proprietor, the method further comprises receiving a third network communication from a server of the other database proprietor, and the third network communication includes the second sketch data. 12 . The method of claim 8 , wherein the first sketch data and the second sketch data are generated using a Bernoulli hash. 13 . The method of claim 12 , wherein the first sketch data is generated by applying a Bernoulli hash to personally identifiable information of the first individuals. 14 . The method of claim 8 , wherein: the first sketch data is first binomial sketch data, and and the second sketch data is second binomial sketch data. 15 . A non-transitory computer-readable medium having stored therein instructions that, when executed by a computing system, cause the computing system to perform a set of acts comprising: receiving a first network communication from a server of a database proprietor, wherein the first network communication includes first sketch data representative of first individuals exposed to media; determining coefficient values of a polynomial based on (i) variances in the first sketch data and second sketch data (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data, wherein the second sketch data is representative of second individuals exposed to the media, wherein the first sketch data and the second sketch data are generated to maintain a privacy of the first individuals and the second individuals; determining a root of the polynomial, wherein the root is an estimate of an overlap between the first individuals and the second individuals; estimating a deduplicated audience size for the media based on the overlap, the first cardinality, and the second cardinality; and transmitting a second network communication to a third-party entity, wherein the second network communication includes a report based on the deduplicated audience size. 16 . The non-transitory computer-readable medium of claim 15 , wherein the first individuals are subscribers of the database proprietor. 17 . The non-transitory computer-readable medium of claim 16 , wherein: the second individuals are subscribers of another database proprietor, the set of acts further comprises receiving a third network communication from a server of the other database proprietor, and the third network communication includes the second sketch data. 18 . The non-transitory computer-readable medium of claim 15 , wherein the first sketch data and the second sketch data are generated using a Bernoulli hash. 19 . The non-transitory computer-readable medium of claim 18 , wherein the first sketch data is generated by applying a Bernoulli hash to personally identifiable information of the first individuals. 20 . The non-transitory computer-readable medium of claim 15 , wherein: the first sketch data is first binomial sketch data, and and the second sketch data is second binomial sketch data.
Hash tables · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
of multimedia data, e.g. slideshows comprising image and additional audio data (retrieval of still image data G06F16/50; retrieval of audio data G06F16/60; retrieval of video data G06F16/70) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.