Systems, methods, and computer-readable media for secure and private data valuation and transfer

US12554871B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12554871-B2
Application numberUS-202217712952-A
CountryUS
Kind codeB2
Filing dateApr 4, 2022
Priority dateApr 4, 2022
Publication dateFeb 17, 2026
Grant dateFeb 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods and computer-readable media for generating, by a first entity, a set of synthetic data samples that represent a corresponding set of original data samples; sending, by the first entity, the set of synthetic data samples for use by a second entity to generate a set of second entity predictions for the set of synthetic data samples using a machine learning (ML) model that has been trained using a second entity dataset; sending, by the first entity, for a third entity, a set of trusted labels corresponding to the set of original data samples; and receiving, by the first entity, from the third entity, valuation information for the second entity dataset that is based on a comparison by the third entity of the set of trusted labels and the set of second entity predictions.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A method for preserving data security and privacy during data valuation, the method being performed by a first entity in communication with a second entity and a third entity, the method comprising: generating, by the first entity, a set of synthetic data samples that represent a corresponding set of original data samples, wherein the set of synthetic data samples is generated based on a utility objective and a data security objective by: using multiple training machine learning (ML) models to generate a first set of outputs from the set of original data samples, wherein the multiple training ML models are obtained by training each of the multiple training ML models using a respective randomized version of the set of original data samples, and each of the multiple training ML models is trained based on a common model architecture and training algorithm as used to train a ML model at the second entity; using the multiple training ML models to generate a second set of outputs from the set of synthetic data samples; updating the set of synthetic data samples based on minimizing a normalized sum of prediction differences representing a difference between the first set of outputs and the second set of outputs to satisfy the utility objective and also based on maximizing a sample distance representing a distance between the set of synthetic data samples and the set of original data samples in a sample space to satisfy the data security objective; sending, by the first entity, the set of synthetic data samples for use by the second entity to generate a set of second entity predictions for the set of synthetic data samples using the ML model at the second entity that has been trained using a second entity dataset; sending, by the first entity, for the third entity, a set of trusted labels corresponding to the set of original data samples; and receiving, by the first entity, from the third entity, valuation information for the second entity dataset that is based on a comparison by the third entity of the set of trusted labels and the set of second entity predictions. 2 . The method of claim 1 further comprising: receiving, by the first entity, the second entity dataset from the second entity upon completion by the first entity of a predetermined transfer requirement. 3 . The method of claim 1 wherein the first entity, second entity, and third entity each comprise a respective controlled access computer system and (i) neither the second entity or the third entity have access to the set of original data samples, (ii) the second entity does not have access to the set of trusted labels, and (iii) the first entity does not have access to the second entity dataset prior to the completion by the first entity of the predetermined transfer requirement. 4 . The method of claim 1 , wherein the second entity is one of a plurality of second entities, and sending, by the first entity, the set of synthetic data samples comprises sending the set of synthetic data samples for use by each of the plurality of second entities to generate a respective set of second entity predictions for the set of synthetic data samples using a respective trained ML model that has been trained using a respective second entity dataset that is unique to the second entity, and receiving, by the first entity, from the third entity, valuation information comprises receiving, by the first entity, valuation information from the third entity for each of the respective second entity datasets. 5 . The method of claim 4 wherein generating, by the first entity, the set of synthetic data samples comprises synthesizing a respective data sample for each original data sample based on both the utility objective that enables consistent valuation information to be generated by the third entity for each of the respective second entity datasets and the security objective that differentiates the synthetic data sample from the original data sample. 6 . The method of claim 1 wherein generating, by the first entity, the set of synthetic data samples further comprises: synthesizing, for each of the original data samples, the respective synthetic data sample by: randomly initializing the synthetic data sample; (a) using a plurality of the multiple training ML models to generate respective model outputs for both the synthetic data sample and the original data sample; (b) updating the synthetic data sample based on: (i) a first gradient computed by the first entity based on the normalized sum of prediction differences between the respective model outputs for the synthetic data sample and the respective model outputs for the original data sample across the multiple training ML models, and (ii) a second gradient computed by the first entity based on the sample distance between the synthetic data sample and the original data sample; and (c) repeating (a) and (b) with an objective of minimizing the prediction difference and maximizing the sample distance, until a defined completion criteria is achieved. 7 . The method of claim 6 wherein the original data samples are image samples, the sample space is a pixel space, and the respective model outputs are final layer activations. 8 . The method of claim 1 , wherein the second entity is one of a plurality of second entities that each generate a respective set of second entity predictions for the set of synthetic data samples using a respective trained ML model that has been trained using a respective second entity dataset that is unique to the second entity, the method comprising: receiving, by the third entity, the set of trusted labels from the first entity; receiving, by the third entity, the respective sets of second entity predictions generated by each of the plurality of second entities; computing, by the third entity, the valuation information for each of the respective second entity datasets, sending, by the third entity, for the first entity, the valuation information for each of the respective second entity datasets, and sending by the third entity, for each second entity in the plurality of second entities, the valuation information for the respective second entity dataset of the second entity. 9 . The method of claim 8 wherein the valuation information for each respective second entity dataset comprises: an individual utility value that is based on an individual comparison of the set of trusted labels and the set of second entity predictions generated for the second entity, and a marginal utility value that is based on a marginal increase in utility of the respective second entity predictions compared to predictions that includes a plurality of the second entity predictions. 10 . The method of claim 8 comprising sending to each second entity in the plurality of second entities an indication of the common model architecture and training algorithm for application by the second entity for training its respective ML model. 11 . The method of claim 1 , comprising, receiving by the second entity, an indication of the common model architecture and training algorithm for application by the second entity for training the respective ML model. 12 . A first entity in communication with a second entity and a third entity, the first entity comprising one or more processors and a memory storing executable instructions that, when executed by the one or more processors configure the first entity to: generate a set of synthetic data samples that represent a corresponding set of original data samples, wherein the set of synthetic data samples is generated based on a utility objective and a data security objective by: using multiple training

Assignees

Inventors

Classifications

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • using classification, e.g. of video objects · CPC title

  • structured as a network, e.g. client-server architectures · CPC title

  • to a system of files or objects, e.g. local or distributed file system or database · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12554871B2 cover?
Systems, methods and computer-readable media for generating, by a first entity, a set of synthetic data samples that represent a corresponding set of original data samples; sending, by the first entity, the set of synthetic data samples for use by a second entity to generate a set of second entity predictions for the set of synthetic data samples using a machine learning (ML) model that has bee…
Who is the assignee on this patent?
Singh Gursimran, Ayub Ahnaf Tazwar, Wang Chendi, and 3 more
What technology area does this patent fall under?
Primary CPC classification G06F21/6218. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).