Methods and apparatus to estimate audience sizes of media using deduplication based on binomial sketch data

US12153553B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12153553-B2
Application numberUS-202217886216-A
CountryUS
Kind codeB2
Filing dateAug 11, 2022
Priority dateJul 5, 2019
Publication dateNov 26, 2024
Grant dateNov 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus to estimate audience sizes using deduplication based on binomial sketch data are disclosed. An apparatus includes processor circuitry to instantiate coefficient analyzer circuitry to determine coefficient values of a polynomial based on (i) variances in values in the first sketch data and second sketch data, (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data. The processor circuitry to instantiate overlap analyzer circuitry to determine a real root of the polynomial, the real root corresponding to the quantity of the second subscribers that are duplicates of the first subscribers. The processor circuitry to instantiate report generator circuitry to estimate a deduplicated audience size based on the estimate of the quantity of the second subscribers that are duplicates of the first subscribers and the first and second cardinalities.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: communication circuitry to receive first network communication from a first server of a first database proprietor, the first network communication including first sketch data representative of first subscribers of the first database proprietor; and processor circuitry including one or more of: at least one of a central processor unit, a graphics processor unit, or a digital signal processor, the at least one of the central processor unit, the graphics processor unit, or the digital signal processor having control circuitry to control data movement within the processor circuitry, arithmetic and logic circuitry to perform one or more first operations corresponding to instructions, and one or more registers to store a result of the one or more first operations, the instructions in the apparatus; a Field Programmable Gate Array (FPGA), the FPGA including first logic gate circuitry, a plurality of configurable interconnections, and storage circuitry, the first logic gate circuitry and the plurality of the configurable interconnections to perform one or more second operations, the storage circuitry to store a result of the one or more second operations; or Application Specific Integrated Circuitry (ASIC) including second logic gate circuitry to perform one or more third operations; the processor circuitry to perform at least one of the first operations, the second operations, or the third operations to instantiate: coefficient analyzer circuitry to determine coefficient values of a polynomial based on (i) variances in values in the first sketch data and second sketch data, (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data, the second sketch data representative of second subscribers of a second database proprietor, the first and second sketch data generated to maintain a privacy of the first and second subscribers, the privacy based on the first and second sketch data being unusable by an intercepting party to determine a quantity of the second subscribers that are duplicates of the first subscribers; overlap analyzer circuitry to determine a real root of the polynomial, the real root corresponding to the quantity of the second subscribers that are duplicates of the first subscribers; and report generator circuitry to estimate a deduplicated audience size based on the estimate of the quantity of the second subscribers that are duplicates of the first subscribers and the first and second cardinalities, use of the variances to determine the coefficient values to improve functionality of the processor circuitry by increasing efficiencies in both processing and memory usage by the processor circuitry when estimating the deduplicated audience size relative to not using the variances, the communication circuitry to transmit a second network communication to a third-party entity, the second network communication including a report based on the deduplicated audience size. 2. The apparatus of claim 1 , wherein the first sketch data is represented by a first vector, the first vector to have a first number of elements, different ones of the elements in the first vector corresponding to a sum of outputs of respective ones of a second number of hash functions applied to information associated with the first subscribers of the first database proprietor that accessed media, the first number equal to the second number. 3. The apparatus of claim 2 , wherein the second sketch data is to be represented by a second vector, the second vector to have a third number of elements, the third number of elements equal to the first number, different ones of the elements in the second vector corresponding to a sum of outputs of the respective ones of the second number of hash functions applied to information associated with the second subscribers of the second database proprietor that accessed the media. 4. The apparatus of claim 2 , wherein ones of the hash functions are different Bernoulli hash functions. 5. The apparatus of claim 2 , wherein the information is personally identifiable information. 6. The apparatus of claim 2 , wherein the second number of hash functions is selected to provide a relative error in the audience size estimate no greater than a particular relative error at a particular confidence level. 7. The apparatus of claim 1 , wherein the values in the first and second sketch data follow a binomial distribution. 8. The apparatus of claim 1 , wherein coefficient analyzer circuitry is to normalize the coefficient values based on a first cardinality of the first sketch data, the first cardinality being less than or equal to a second cardinality of the second sketch data. 9. An apparatus comprising: means for communicating to receive a first network communication from a first server of a first database proprietor, the first network communication including first sketch data representative of first subscribers of the first database proprietor; and means for processing to: determine coefficient values of a polynomial based on (i) variances in values in the first sketch data and second sketch data, (ii) a first cardinality of the first sketch data, and (iii) a second cardinality of the second sketch data, the second sketch data representative of second subscribers of a second database proprietor, the first and second sketch data generated to maintain a privacy of the first and second subscribers, the privacy based on the first and second sketch data being unusable by an intercepting party to determine a quantity of the second subscribers that are duplicates of the first subscribers; determine a real root of the polynomial, the real root corresponding to ones of the second subscribers that are duplicates of the first subscribers; and estimate a deduplicated audience size based on the ones of the second subscribers that are duplicates of the first subscribers and the first and second cardinalities, use of the variances to determine the coefficient values to improve functionality of the means for processing by increasing efficiencies in both processing and memory usage when the means for processing estimates the deduplicated audience size relative to not using the variances, the means for communicating to transmit a second network communication to a third-party entity, the second network communication including a report based on the deduplicated audience size. 10. The apparatus of claim 9 , wherein the first sketch data is represented by a first vector, the first vector to have a first number of elements, different ones of the elements in the first vector corresponding to a sum of outputs of respective ones of a second number of hash functions applied to information associated with the first subscribers of the first database proprietor that accessed media, the first number equal to the second number. 11. The apparatus of claim 10 , wherein the second sketch data is to be represented by a second vector, the second vector to have a third number of elements, the third number equal to the first number, different ones of the elements in the second vector corresponding to a sum of outputs of the respective ones of the second number of hash functions applied to information associated with the second subscribers of the second database proprietor that accessed the media. 12. The apparatus of claim 10 , wherein ones of the hash functions are different Bernoulli hash functions. 13. The apparatus of claim 10 , wherein the information is personally identifiable information. 14. The apparatus of claim 10 , wherein the second number of hash functions is selected to provide

Assignees

Inventors

Classifications

  • Hash tables · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • G06F16/40Primary

    of multimedia data, e.g. slideshows comprising image and additional audio data (retrieval of still image data G06F16/50; retrieval of audio data G06F16/60; retrieval of video data G06F16/70) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12153553B2 cover?
Methods and apparatus to estimate audience sizes using deduplication based on binomial sketch data are disclosed. An apparatus includes processor circuitry to instantiate coefficient analyzer circuitry to determine coefficient values of a polynomial based on (i) variances in values in the first sketch data and second sketch data, (ii) a first cardinality of the first sketch data, and (iii) a se…
Who is the assignee on this patent?
Nielsen Co Us Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).