What technology area does this patent fall under?

Primary CPC classification G06F21/6245. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Sep 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Privacy and modeling preserved data sharing

Patent metadata
Field	Value
Publication number	US-2016283735-A1
Application number	US-201514667163-A
Country	US
Kind code	A1
Filing date	Mar 24, 2015
Priority date	Mar 24, 2015
Publication date	Sep 29, 2016
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, method and computer program product for generating a classification model using original data that is sensitive or private to a data owner. The method includes: receiving, from one or more entities, a masked data set having masked data corresponding to the original sensitive data, and further including a masked feature label set for use in classifying the masked data contents; forming a shared data collection of the masked data and the masked feature label sets received; and training, by a second entity, a classification model from the shared masked data and feature label sets, wherein the classification model learned from the shared masked data and feature label sets is the same as a classification model learned from the original sensitive data. The sensitive features and labels cannot be reliably recovered even when both the masked data and the learning algorithm are known.

First claim

Opening claim text (preview).

1 - 7 . (canceled) 8 . A system for generating a classification model using original data that is sensitive or private to a data owner comprising: a memory storage device; a hardware processor in communication with said memory storage device, the hardware processor configured to perform a method to: receive, from one or more first entities, a masked data set, each data set from an entity having masked data corresponding to the original sensitive data, the masked data set further including a masked feature label set for use in classifying the masked data contents; form a shared data collection of the masked data and the masked feature label sets received from the first entities; and train, by a second entity, a classification model from the shared masked data and feature label sets, the model being a classification model configured to classify original sensitive data contained in masked data sets received from the entities, wherein the classification model learned from the shared masked data and feature label sets is the same as the model learned from the original sensitive data. 9 . The system of claim 8 , wherein said processor device is further configured to generate said masked data by: access, from a computing device associated with a first entity, one or more records having original data sensitive to a data owner; generate an original data matrix of original data content including sensitive features and a corresponding feature label set for use in classifying said feature data; generate a random feature matrix sharing the same subspace as said sensitive features of original data matrix; compute an intermediate data structure as a product of said original data feature set matrix and said generated random feature matrix; compute one or more further intermediate data structures; form a convex optimization problem having an objective function based on said intermediate data structure, said original data matrix of original data content, said corresponding feature label set, and said one or more further intermediate data structures; and solve said convex optimization problem, said solving generating said masked matrix data feature set and masked feature label set. 10 . The system of claim 9 , wherein to compute said one or more further intermediate data structures, said processor device is further configured to: compute a low-rank soft feature data matrix and a corresponindg soft class labels vector, the formed low-rank soft feature data matrix having denoised features and class labels of the sensitive data, said low-rank soft feature data matrix having entries that include the original data matrix of original data content having an added noise component. 11 . The system of claim 9 , wherein to compute said one or more further intermediate data structures, said processor device is further configured to: compute a first loss function for a feature according to: ℒ A  ( A , A ~ ) = ∑ i  ∑ j  ( A ij - A ~ ij ) 2 where A ij and Ã ij represent the feature matrix A and low-ranked feature matrix, respectively, and i and j are indices into the respective feature set matrices; and compute a second loss function according to: ℒ b  ( b , b ~ ) = ∑ i  ∑ j  1 γ  log  { 1 + exp  [ - γ  ( b ij  b ~ ij ) ] } where {tilde over (b)} and {tilde over (b)} ij represent an class and low-rank class label vector set, and γ is a variable. 12 . The system of claim 11 , wherein said processor device is configured to form said convex optimization problem as an objective function according to: min C , d , A ~ , b ~  μ   [

Assignees

IBM

Inventors

Classifications

G06F21/6254
by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title
G06N99/005
Physics · mapped topic
G06F21/6245Primary
Protecting personal data, e.g. for financial or medical purposes · CPC title
G06N20/00Primary
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 56975489

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016283735A1 cover?: A system, method and computer program product for generating a classification model using original data that is sensitive or private to a data owner. The method includes: receiving, from one or more entities, a masked data set having masked data corresponding to the original sensitive data, and further including a masked feature label set for use in classifying the masked data contents; forming…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Sep 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).