Methods and apparatus to estimate audience sizes of media using deduplication based on binomial sketch data

US2025061101A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025061101-A1
Application numberUS-202418939035-A
CountryUS
Kind codeA1
Filing dateNov 6, 2024
Priority dateJul 5, 2019
Publication dateFeb 20, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus to estimate audience sizes using deduplication based on binomial sketch data are disclosed. An apparatus includes processor circuitry to instantiate coefficient analyzer circuitry to determine coefficient values of a polynomial based on (i) variances in values in the first sketch data and second sketch data, (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data. The processor circuitry to instantiate overlap analyzer circuitry to determine a real root of the polynomial, the real root corresponding to the quantity of the second subscribers that are duplicates of the first subscribers. The processor circuitry to instantiate report generator circuitry to estimate a deduplicated audience size based on the estimate of the quantity of the second subscribers that are duplicates of the first subscribers and the first and second cardinalities.

First claim

Opening claim text (preview).

1 . A computing system comprising a processor and a memory, the computing system configured to perform a set of acts comprising: receiving a first network communication from a server of a database proprietor, wherein the first network communication includes first sketch data representative of first individuals exposed to media; determining coefficient values of a polynomial based on (i) variances in the first sketch data and second sketch data (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data, wherein the second sketch data is representative of second individuals exposed to the media, wherein the first sketch data and the second sketch data are generated to maintain a privacy of the first individuals and the second individuals; determining a root of the polynomial, wherein the root is an estimate of an overlap between the first individuals and the second individuals; estimating a deduplicated audience size for the media based on the overlap, the first cardinality, and the second cardinality; and transmitting a second network communication to a third-party entity, wherein the second network communication includes a report based on the deduplicated audience size. 2 . The computing system of claim 1 , wherein the first individuals are subscribers of the database proprietor. 3 . The computing system of claim 2 , wherein the first individuals are subscribers of the database proprietor that are exposed to the media via a platform operated by the database proprietor. 4 . The computing system of claim 2 , wherein: the second individuals are subscribers of another database proprietor, the set of acts further comprises receiving a third network communication from a server of the other database proprietor, and the third network communication includes the second sketch data. 5 . The computing system of claim 1 , wherein the first sketch data and the second sketch data are generated using a Bernoulli hash. 6 . The computing system of claim 5 , wherein the first sketch data is generated by applying a Bernoulli hash to personally identifiable information of the first individuals. 7 . The computing system of claim 1 , wherein: the first sketch data is first binomial sketch data, and and the second sketch data is second binomial sketch data. 8 . A method comprising: receiving, by a computing system, a first network communication from a server of a database proprietor, wherein the first network communication includes first sketch data representative of first individuals exposed to media; determining, by the computing system, coefficient values of a polynomial based on (i) variances in the first sketch data and second sketch data (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data, wherein the second sketch data is representative of second individuals exposed to the media, wherein the first sketch data and the second sketch data are generated to maintain a privacy of the first individuals and the second individuals; determining, by the computing system, a root of the polynomial, wherein the root is an estimate of an overlap between the first individuals and the second individuals; estimating, by the computing system, a deduplicated audience size for the media based on the overlap, the first cardinality, and the second cardinality; and transmitting, by the computing system, a second network communication to a third-party entity, wherein the second network communication includes a report based on the deduplicated audience size. 9 . The method of claim 8 , wherein the first individuals are subscribers of the database proprietor. 10 . The method of claim 9 , wherein the first individuals are subscribers of the database proprietor that are exposed to the media via a platform operated by the database proprietor. 11 . The method of claim 9 , wherein: the second individuals are subscribers of another database proprietor, the method further comprises receiving a third network communication from a server of the other database proprietor, and the third network communication includes the second sketch data. 12 . The method of claim 8 , wherein the first sketch data and the second sketch data are generated using a Bernoulli hash. 13 . The method of claim 12 , wherein the first sketch data is generated by applying a Bernoulli hash to personally identifiable information of the first individuals. 14 . The method of claim 8 , wherein: the first sketch data is first binomial sketch data, and and the second sketch data is second binomial sketch data. 15 . A non-transitory computer-readable medium having stored therein instructions that, when executed by a computing system, cause the computing system to perform a set of acts comprising: receiving a first network communication from a server of a database proprietor, wherein the first network communication includes first sketch data representative of first individuals exposed to media; determining coefficient values of a polynomial based on (i) variances in the first sketch data and second sketch data (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data, wherein the second sketch data is representative of second individuals exposed to the media, wherein the first sketch data and the second sketch data are generated to maintain a privacy of the first individuals and the second individuals; determining a root of the polynomial, wherein the root is an estimate of an overlap between the first individuals and the second individuals; estimating a deduplicated audience size for the media based on the overlap, the first cardinality, and the second cardinality; and transmitting a second network communication to a third-party entity, wherein the second network communication includes a report based on the deduplicated audience size. 16 . The non-transitory computer-readable medium of claim 15 , wherein the first individuals are subscribers of the database proprietor. 17 . The non-transitory computer-readable medium of claim 16 , wherein: the second individuals are subscribers of another database proprietor, the set of acts further comprises receiving a third network communication from a server of the other database proprietor, and the third network communication includes the second sketch data. 18 . The non-transitory computer-readable medium of claim 15 , wherein the first sketch data and the second sketch data are generated using a Bernoulli hash. 19 . The non-transitory computer-readable medium of claim 18 , wherein the first sketch data is generated by applying a Bernoulli hash to personally identifiable information of the first individuals. 20 . The non-transitory computer-readable medium of claim 15 , wherein: the first sketch data is first binomial sketch data, and and the second sketch data is second binomial sketch data.

Assignees

Inventors

Classifications

  • Hash tables · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • G06F16/40Primary

    of multimedia data, e.g. slideshows comprising image and additional audio data (retrieval of still image data G06F16/50; retrieval of audio data G06F16/60; retrieval of video data G06F16/70) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025061101A1 cover?
Methods and apparatus to estimate audience sizes using deduplication based on binomial sketch data are disclosed. An apparatus includes processor circuitry to instantiate coefficient analyzer circuitry to determine coefficient values of a polynomial based on (i) variances in values in the first sketch data and second sketch data, (ii) a first cardinality of the first sketch data, and (iii) a se…
Who is the assignee on this patent?
Nielsen Co Us Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).