Privacy preserving statistical analysis on distributed databases

US10146958B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10146958-B2
Application numberUS-201313804701-A
CountryUS
Kind codeB2
Filing dateMar 14, 2013
Priority dateMar 14, 2013
Publication dateDec 4, 2018
Grant dateDec 4, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aggregate statistics are securely determined on private data by first sampling independent first and second data at one or more clients to obtain sampled data, wherein a sampling parameter substantially smaller than a length of the data. The sampled data are encrypted to obtain encrypted data, which are then combined. The combined encrypted data are randomized to obtain randomized data. At an authorized third-party processor, a joint distribution of the first and second data is estimated from the randomized encrypted data, such that a differential privacy requirement of the first and second is satisfied.

First claim

Opening claim text (preview).

We claim: 1. A system for securely communicating aggregate statistics on private data to mutually untrusting Internet parties, comprising: one or more client devices, wherein the one or more client devices sample data X n and Y n to obtain sampled data {tilde over (X)} m and {tilde over (Y)} m , wherein m is a sampling parameter smaller than a length n of the data, and encrypt the sampled data {tilde over (X)} m and {tilde over (Y)} m with encryption keys to obtain encrypted data {hacek over (X)} m and {hacek over (Y)} m ; a server in Internet communication with the one or more client devices, wherein the server receives the encrypted data {hacek over (X)} m and {hacek over (Y)} m from the one or more client devices, combines the encrypted data {hacek over (X)} m and {hacek over (Y)} m to obtain combined encrypted data, randomizes the combined encrypted data using a post randomization (PRAM) to obtain randomized encrypted data X m , Y m , and transmits the randomized encrypted data X m , Y m in response to receiving an Internet request for the data X n and Y n ; an authorized third-party processor in Internet communication with the server and with the one or more client devices, wherein the authorized third-party processor, in response to transmitting the Internet request for the data X n and Y n to the server, receives from the server the randomized encrypted data X m , Y m , receives, in response to transmitting an Internet request to the one or more client devices, from the one or more client devices decryption keys corresponding to the encryption keys, decrypts the randomized encrypted data X m , Y m with the decryption keys to produce randomized data, and estimates a joint distribution {circumflex over (T)} X n ,Y n of the data X n and Y n from the randomized data to produce the aggregate statistics on the data X n and Y n , such that a differential privacy on the data X n and Y n is a function of the sampling parameter m and a differential privacy of the PRAM γ satisfying a differential privacy requirement according to ɛ = ln ⁡ ( n + m ⁡ ( γ - 1 ) n ) , wherein ε is the differential privacy, m is the sampling parameter, the PRAM includes γ-diagonal matrix A, and ln( ) is a logarithmic function. 2. The system off claim 1 , wherein the encryption is performed before the sampling at the one or more client devices. 3. The system of claim 1 , wherein the randomizing is performed by the one or more client devices. 4. The system of claim 1 , wherein the encrypted data X′ and Y′″ is obtained from the sampled data X′″ and Y′″ using a stream cipher, and decryption parameters for the stream cipher are provided to the authorized third-party processor. 5. The system of claim 4 , wherein the estimating further comprises: reversing, at the authorized third-party processor, the encryption applied to the randomized data X m, Ym, using decryption parameters provided to the authorized third-party processor by the one or more client devices. 6. The system of claim 1 , wherein the randomized data X is obtained from the encrypted data X′ and Y′ using a randomized response mechanism. 7. The system of claim 1 , wherein the one or more client devices perform the sampling, encrypting and randomizing, and the authorized third-party processor performs the combining and estimating. 8. The system of claim 1 , wherein the sampling, randomizing and encrypting can be in any order as long as encrypting parameters are determined by the one or more client devices, and decrypting parameters are provided to the authorized third-party processor. 9. A system for securely communicating aggregate statistics on private data to mutually untrusting parties, comprising: one or more client devices including processors configured to sample first data and second data to obtain sampled data, wherein a sampling parameter is smaller than a length of the data, and encrypt the sampled data to obtain encrypted data; a server in Internet communication with the one or more client devices, wherein the server receives the encrypted data from one or more client devices, combines the encrypted data to obtain combined encrypted data, randomizes the combined encrypted data using a post randomization (PRAM) to obtain randomized encrypted data and transmits the randomized encrypted data upon receiving a request for the first data and the second data; an authorized third-party processor in Internet communication with the server and with the one or more client devices, wherein the authorized third party processor estimates a joint distribution of the first data and the second data from the randomized encrypted data received from the server decrypted with decryption keys received from the one or more client devices to produce the aggregate statistics on the first data and the second data such that a differential privacy of the first data and the second data is a function of the sampling parameter m and a differential privacy of the PRAM γ satisfying a differential privacy requirement according to ɛ = ln ⁡ ( n + m ⁡ ( γ - 1 ) n ) , wherein ε is the differential privacy, m is the sampling parameter, the PRAM includes γ-diagonal matrix A, and ln( ) is a logarithmic function. 10. The system of claim 1 , wherein the randomized data is obtained for the encrypted data using a randomized response mechanism wherein the randomized response mechanism further comprises: independently altering each data element to any other values in an alphabet set with the same probability, or alternatively retaining a value of said each data element. 11. The system of claim 1 , wherein the one or more client devices perform the encrypting, and the server performs the combining, sampling, and randomizing.

Assignees

Inventors

Classifications

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Pseudorandom key sequence combined element-for-element with data sequence, e.g. one-time-pad [OTP] or Vernam's cipher · CPC title

  • {Cryptographic mechanisms or cryptographic} arrangements for secret or secure communications; Network security protocols · CPC title

  • Anonymization, e.g. involving pseudonyms · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10146958B2 cover?
Aggregate statistics are securely determined on private data by first sampling independent first and second data at one or more clients to obtain sampled data, wherein a sampling parameter substantially smaller than a length of the data. The sampled data are encrypted to obtain encrypted data, which are then combined. The combined encrypted data are randomized to obtain randomized data. At an a…
Who is the assignee on this patent?
Mitsubishi Electric Res Laboratories Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/6254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).