User experience using privatized crowdsourced data

US11227063B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11227063-B2
Application numberUS-202017020637-A
CountryUS
Kind codeB2
Filing dateSep 14, 2020
Priority dateJun 4, 2017
Publication dateJan 18, 2022
Grant dateJan 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide a privacy mechanism to protect user data when transmitting the data to a server that estimates a frequency of such data amongst a set of client devices. In one embodiment, a differential privacy mechanism is implemented using a count-mean-sketch technique that can reduce resource requirements required to enable privacy while providing provable guarantees regarding privacy and utility. For instance, the mechanism can provide the ability to tailor utility (e.g. accuracy of estimations) against the resource requirements (e.g. transmission bandwidth and computation complexity).

First claim

Opening claim text (preview).

What is claimed is: 1. A machine-readable medium storing instructions which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: selecting a value of user data to transmit to a server from a set of possible user data values collected on a client device; creating a hash of a term for the value of user data using a random hash function, wherein the random hash function is generated from a randomly selected variant of the term, and wherein a set of possible variants of the term is indexed; encoding at least a portion of the hash of the term for the value as a vector, wherein the encoding includes updating a value of the vector at a position corresponding to the hash; privatizing the vector by changing at least some of the values of the vector with a predefined probability; and transmitting, to the server, the privatized vector and an index value of the randomly selected variant to enable the server to estimate of a frequency of the value of user data on a set of client devices. 2. The machine-readable medium of claim 1 , wherein: the term is a string of characters and the randomly selected variant of the term includes one or more characters representing the index value of the variant appended to the string; the server estimates the frequency of the value of user data by updating a frequency table indexed by the set of possible variants and a row or column of the frequency table corresponding to the index value of the random hash function is updated with the privatized vector; and the server estimates a frequency of each of the set of possible user data values amongst from data accumulated from the set of client devices. 3. The machine-readable medium of claim 1 , wherein the encoding includes initializing the vector with a uniform value and sign and updating the value of the vector includes flipping the sign of the value at the position corresponding to a created hash value. 4. The machine-readable medium of claim 3 , wherein initializing the vector includes setting each value of the vector to a value representing a constant and updating the value of the vector includes setting the value at the position corresponding to a created hash value to a value representing a sign flip of the constant. 5. The machine-readable medium of claim 1 , wherein the random hash function is to address hash collisions when using only the portion of a created hash value and to reduce a number of computations required to create a frequency table while maintaining privacy of the user data values. 6. The machine-readable medium of claim 1 , wherein the encoded vector is a Hadamard matrix and the value of user data represents a website visited by a user of the client device. 7. The machine-readable medium of claim 1 , wherein only the privatized vector and the index value of the randomly selected variant is transmitted to the server as information representing the value of user data. 8. The machine-readable medium of claim 1 , wherein privatizing the vector includes changing at least some of the values of the vector with a predefined probability, the predefined probability based on a privacy parameter. 9. The machine-readable medium of claim 8 , wherein the privacy parameter represents a configured tradeoff between privacy and accuracy. 10. The machine-readable medium of claim 8 , wherein the predefined probability is defined as 1/(1+e ε ), and ε is the privacy parameter. 11. An electronic device comprising: one or more processors; and a memory coupled to the one or more processors, the memory storing instructions, which when executed by the one or more processor, cause the one or more processors to perform operations comprising: selecting a value of user data to transmit to a server from a set of possible user data values collected on a client device; creating a hash of a term for the value of user data using a random hash function, wherein the random hash function is generated from a randomly selected variant of the term, and wherein a set of possible variants of the term is indexed; encoding at least a portion of the hash of the term for the value as a vector, wherein the encoding includes updating a value of the vector at a position corresponding to the hash; privatizing the vector by changing at least some of the values of the vector with a predefined probability; and transmitting, to the server, the privatized vector and an index value of the randomly selected variant to enable the server to estimate of a frequency of the value of user data on a set of client devices. 12. The electronic device of claim 11 , wherein: the term is a string of characters and the randomly selected variant of the term includes one or more characters representing the index value of the variant appended to the string; the server estimates the frequency of the value of user data by updating a frequency table indexed by the set of possible variants and a row or column of the frequency table corresponding to the index value of the random hash function is updated with the privatized vector; and the server estimates a frequency of each of the set of possible user data values amongst from data accumulated from the set of client devices. 13. The electronic device of claim 11 , wherein the encoding includes initializing the vector with a uniform value and sign and updating the value of the vector includes flipping the sign of the value at the position corresponding to a created hash value. 14. The electronic device of claim 13 , wherein initializing the vector includes setting each value of the vector to a value representing a constant and updating the value of the vector includes setting the value at the position corresponding to a created hash value to a value representing a sign flip of the constant. 15. The electronic device of claim 11 , wherein the random hash function is to address hash collisions when using only the portion of a created hash value and to reduce a number of computations required to create a frequency table while maintaining privacy of the user data values. 16. The electronic device of claim 11 , wherein the encoded vector is a Hadamard matrix and the value of user data represents a website visited by a user of the client device. 17. The electronic device of claim 11 , wherein only the privatized vector and the index value of the randomly selected variant is transmitted to the server as information representing the value of user data. 18. The electronic device of claim 11 , wherein privatizing the vector includes changing at least some of the values of the vector with a predefined probability, the predefined probability based on a privacy parameter. 19. The electronic device of claim 18 , wherein the privacy parameter represents a configured tradeoff between privacy and accuracy. 20. The electronic device of claim 18 , wherein the predefined probability is defined as 1/(1+e ε ), and ε is the privacy parameter. 21. A method comprising: selecting a value of user data to transmit to a server from a set of possible user data values collected on a client device; creating a hash of a term for the value of user data using a random hash function, wherein the random hash function is generated from a randomly selected variant of the term, and wherein a set of possible variants of the term is indexed; encoding at least a portion of the hash of the term for the value as a vector, wherein the encoding includes updating a value of the vector at a position

Assignees

Inventors

Classifications

  • Tracking the activity of the user (network monitoring arrangements H04L43/00; recording of computer activity G06F11/34) · CPC title

  • Encoding or coding, e.g. Huffman coding or error correction · CPC title

  • involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • wherein the data content is protected, e.g. by encrypting or encapsulating the payload · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11227063B2 cover?
Embodiments described herein provide a privacy mechanism to protect user data when transmitting the data to a server that estimates a frequency of such data amongst a set of client devices. In one embodiment, a differential privacy mechanism is implemented using a count-mean-sketch technique that can reduce resource requirements required to enable privacy while providing provable guarantees reg…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).