Gradients over distributed datasets

US10320752B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10320752-B2
Application numberUS-201515521409-A
CountryUS
Kind codeB2
Filing dateOct 26, 2015
Priority dateOct 24, 2014
Publication dateJun 11, 2019
Grant dateJun 11, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure relates to characterising data sets that are distributed as multiple data subsets over multiple computers such as by determining a gradient of an objective function. A computer determines a partial gradient of the objective function over a data subset stored on the computer and determines random data. The computer then determines an altered gradient by modifying the partial gradient based on the random data and encrypts the altered gradient such that one or more operations on the altered gradient can be performed based on the encrypted gradient and sends the encrypted gradient. Since the partial gradient is altered based on random data and encrypted it is difficult for another computer to calculate the data that is stored on the first computer. This is an advantage as it allows to preserve the privacy of the data stored on the first computer while still allowing to characterise the data set.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for determining a gradient of an objective function in a machine learning process to iteratively characterise a data set by a data model including model parameters, the dataset is distributed as multiple data subsets over multiple computer systems, the method comprising: determining by a first computer system a partial gradient of the objective function over a first data subset stored on the first computer system, the objective function characterises an error between the data model and the first data subset, the partial gradient points towards a reduction in the error between the data model and the first data subset; determining by the first computer system random data; determining by the first computer system an altered gradient by modifying the partial gradient based on the random data; encrypting by the first computer system the altered gradient to determine a first encrypted gradient such that one or more operations on the altered gradient can be performed based on the first encrypted gradient; determining by the first computer system an output gradient based on the first encrypted gradient; sending by the first computer system the output gradient to a receiving computer system. 2. The method of claim 1 , further comprising: receiving by the first computer system a second encrypted gradient of the objective function over a second data subset, the second data subset being stored on one or more second computer systems different to the first computer system; wherein determining the output gradient comprises performing the one or more operations to combine the first encrypted gradient with the second encrypted gradient. 3. The method of claim 2 , wherein performing the one or more operations to combine the first encrypted gradient with the second encrypted gradient comprises adding the first encrypted gradient to the second encrypted gradient. 4. The method of claim 1 , wherein determining random data comprises determining a random number; and determining the altered gradient comprises multiplying the random number with the partial gradient or adding the random number to the partial gradient. 5. The method of claim 1 , further comprising: encrypting the random data to determine first encrypted random data; determining output random data based on the first encrypted random data; and sending the output random data to the receiving computer system. 6. The method of claim 5 , further comprising: receiving by the first computer system second encrypted random data; wherein determining the output random data comprises performing the one or more operations to combine the first encrypted random data with the second random data. 7. The method of claim 1 , wherein determining the partial gradient is based on a regression model. 8. The method of claim 1 , wherein the first data subset comprises training data for training a classifier and the training data comprises one or more samples and a label for each of the one or more samples. 9. The method of claim 8 , wherein the one or more samples comprise DNA related data. 10. The method of claim 1 , wherein determining the partial gradient comprises determining the partial gradient to extract principle components of the data set. 11. The method of claim 10 , wherein the first data subset comprises multiple images. 12. The method of claim 1 , wherein the first data subset comprises training data of a recommender system and determining the partial gradient comprises determining the partial gradient of the recommender system. 13. The method of claim 1 , wherein the data set comprises data from which an anomaly or outlier is to be detected and determining the partial gradient comprises determining the partial gradient of an anomaly or outlier detection system. 14. The method of claim 1 , wherein the first data subset or the second data subset or both consist of a single data record. 15. The method of claim 1 , wherein encrypting the altered gradient comprises using Paillier encryption. 16. A non-transitory computer readable medium comprising computer-executable instructions stored thereon, that when executed by a processor, causes the processor to perform a method of determining a gradient of an objective function in a machine learning process to iteratively characterise a data set by a data model including model parameters, the dataset is distributed as multiple data subsets over multiple computer systems, the method comprising: determining by a first computer system a partial gradient of the objective function over a first data subset stored on the first computer system, the objective function characterises an error between the data model and the first data subset, the partial gradient points towards a reduction in the error between the data model and the first data subset; determining by the first computer system random data; determining by the first computer system an altered gradient by modifying the partial gradient based on the random data; encrypting by the first computer system the altered gradient to determine a first encrypted gradient such that one or more operations on the altered gradient can be performed based on the first encrypted gradient; determining by the first computer system an output gradient based on the first encrypted gradient; sending by the first computer system the output gradient to a receiving computer system. 17. A computer system for determining a gradient of an objective function in a machine learning process to iteratively characterise a data set by a data model including model parameters, the dataset is distributed as multiple data subsets over multiple computer systems, the computer system comprising: a datastore to store a first data subset; a processor to determine a partial gradient of the objective function over the first data subset, the objective function characterises an error between the data model and the first data subset, the partial gradient points towards a reduction in the error between the data model and the first data subset, determine random data, determine an altered gradient by modifying the partial gradient based on the random data, encrypt the altered gradient to determine a first encrypted gradient such that one or more operations on the altered gradient can be performed based on the first encrypted gradient, and determine an output gradient based on the first encrypted gradient; and an output port to send the output gradient to a receiving computer system.

Assignees

Inventors

Classifications

  • ICT programming tools or database systems specially adapted for bioinformatics · CPC title

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • the encryption apparatus using shift registers or memories for block-wise {or stream} coding, e.g. DES systems {or RC4; Hash functions; Pseudorandom sequence generators} · CPC title

  • during transmission, i.e. party's identity is protected against eavesdropping, e.g. by using temporary identifiers, but is known to the other party or parties involved in the communication · CPC title

  • involving homomorphic encryption · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10320752B2 cover?
This disclosure relates to characterising data sets that are distributed as multiple data subsets over multiple computers such as by determining a gradient of an objective function. A computer determines a partial gradient of the objective function over a data subset stored on the computer and determines random data. The computer then determines an altered gradient by modifying the partial grad…
Who is the assignee on this patent?
Nat Ict Australia Ltd
What technology area does this patent fall under?
Primary CPC classification H04L63/0414. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jun 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).