Flexible data security and machine learning system for merging third-party data

US11792167B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11792167-B2
Application numberUS-202117219482-A
CountryUS
Kind codeB2
Filing dateMar 31, 2021
Priority dateMar 31, 2021
Publication dateOct 17, 2023
Grant dateOct 17, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for a flexible data security and machine learning system for merging third-party data are provided. In one technique, the system receives a data set from a third-party entity and receives selection data that indicates that the third-party entity selected a set of data security policies that includes an encryption option and a data mixing option from among multiple data mixing options. In response to receiving the selection data, the system stores data that associates the set of data security policies with the data set, encrypts the data set according to the encryption option, and persistently stores the encrypted data set. Later, the system decrypts the encrypted data set in volatile memory, generates, based on the data mixing option, training data based on the decrypted version of the data set, trains a machine-learned model based on the training data, and stores the machine-learned model in association with the data set.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, from a third-party entity, a data set; receiving selection data that indicates that the third-party entity selected a set of data security policies that includes an encryption option and a data mixing option from among a plurality of data mixing options; in response to receiving the selection data: storing data that associates the set of data security policies with the data set; encrypting the data set according to the encryption option to generate an encrypted data set; storing the encrypted data set in persistent storage; after storing the encrypted data set: reading the encrypted data set from the persistent storage into volatile memory; based on the data mixing option that is associated with the data set, generating training data based on the encrypted data set and training a machine-learned model based on the training data; storing the machine-learned model in association with the data set and the third-party entity; wherein the method is performed by one or more computing devices. 2. The method of claim 1 , wherein the encryption option is an encryption-on-storage option that is one of a plurality of data accessibility options that includes one or more of: end-to-end encryption or no encryption. 3. The method of claim 1 , wherein the set of data security policies includes a data storage option from among a plurality of data storage options; the plurality of data storage options including at least one storage option selected from a group comprising: a no storage separation policy; a logically-separated storage policy; or a physically-separate storage policy; and combinations thereof. 4. The method of claim 1 , wherein the plurality of data mixing options include two or more of: complete separation, sharing of coefficients, combinable with public data, mergeable with data sets from similar entities, sample-based mergeable, or mergeable-with-any-data set. 5. The method of claim 1 , wherein: the data set is a first data set and the third-party entity is a first third-party entity; the data mixing option is a sharing of coefficients option; the machine-learned model is a third machine-learned model; the method further comprising: generating first training data based on the data set and no other data set; using one or more machine learning techniques to generate a first machine-learned model based on the first training data; generating second training data based on a second data set from a second third-party entity that is different than the first third-party entity; using the one or more machine learning techniques to generate a second machine-learned model based on the second training data; the first machine-learned model comprises a first set of coefficients; the second machine-learned model comprises a second set of coefficients that is different than the first set of coefficients; generating the third machine-learned model comprises aggregating the first set of coefficients and the second set of coefficients to generate a third set of coefficients; the third machine-learned model comprises the third set of coefficients. 6. The method of claim 1 , wherein: the data mixing option is a combinable-with-public data option; the machine-learned model is a third machine-learned model; the method further comprising: generating first training data based on profile data that was uploaded to a content sharing platform by a plurality of users of the content sharing platform; generating a first machine-learned model based on the first training data; generating second training data based on the data set; generating a second machine-learned model based on the second training data; the third machine-learned model is based on the first machine-learned model and the second machine-learned model. 7. The method of claim 1 , wherein: the data mixing option is a mergeable-with-data sets-from-similar-entities option; the method further comprising: identifying a plurality of data sets that includes the data set and that are similar in size with each other; generating training data based on the plurality of data sets; generating the machine-learned model is based on the training data using one or more machine learning techniques; storing the machine-learned model in association with the data set and the third-party entity comprises storing the machine-learned model in associated with each data set in the plurality of data sets and with each third-party entity that provided a data set in the plurality of data sets. 8. The method of claim 1 , wherein: the data mixing option is a sample-based mergeable option; the method further comprising: identifying a plurality of data sets that includes the data set and one or more other data sets; for each data set in the plurality of data sets: retrieving a sample from the data set; adding the sample to a sample set, wherein a size of each sample in the sample set is approximately the same; generating training data based on the sample set; generating the machine-learned model is based on the training data using one or more machine learning techniques; storing the machine-learned model in association with the data set and the third-party entity comprises storing the machine-learned model in associated with each data set in the plurality of data sets and with each third-party entity that provided a data set in the plurality of data sets. 9. The method of claim 1 , wherein: the data mixing option is a mergeable-with-any-data set option; the method further comprising: identifying a plurality of data sets that includes the data set and one or more other data sets that are also associated with the mergeable-with-any-data set option; generating training data based on the plurality of data sets; generating the machine-learned model is based on the training data using one or more machine learning techniques; storing the machine-learned model in association with the data set and the third-party entity comprises storing the machine-learned model in associated with each data set in the plurality of data sets and with each third-party entity that provided a data set in the plurality of data sets. 10. The method of claim 1 , further comprising: causing a user interface to be presented on a screen of a computing device; wherein the user interface indicates (1) a plurality of data accessibility options that includes the encryption option and (2) the plurality of data mixing options; wherein a user of the computing device selects, through the user interface, the encryption option and the data mixing option. 11. The method of claim 1 , further comprising: receiving second selection data that indicates that a second third-party entity selected a second set of data security policies that includes a data accessibility option from among a plurality of data accessibility options and a second data mixing option from among the plurality of data mixing options; determining whether the data accessibility option conflicts with the second data mixing option; in response to determining that the data accessibility option conflicts with the second data mixing option, generating a notification that indicates that a conflict exists and causing the notification to be presented on a computing device. 12. The method of claim 1 , wherein the data mixing option is a first data mixing option, the method further comprising: receiving input that indicates the third-party entity selects a second data mixing option that is different than the first data mixing option; in response to receiving the input: updating the set of data security policies to indicate th

Assignees

Inventors

Classifications

  • wherein the data content is protected, e.g. by encrypting or encapsulating the payload · CPC title

  • Machine learning · CPC title

  • for managing network security; network security policies in general (filtering policies H04L63/0227) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11792167B2 cover?
Techniques for a flexible data security and machine learning system for merging third-party data are provided. In one technique, the system receives a data set from a third-party entity and receives selection data that indicates that the third-party entity selected a set of data security policies that includes an encryption option and a data mixing option from among multiple data mixing options…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification H04L63/0428. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Oct 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).