Methods and systems for federated learning utilizing customer synthetic data models

US12164677B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12164677-B2
Application numberUS-202218063394-A
CountryUS
Kind codeB2
Filing dateDec 8, 2022
Priority dateDec 8, 2022
Publication dateDec 10, 2024
Grant dateDec 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems are described for novel uses and/or improvements to federated learning. As one example, methods and systems are described for improving the applicability of federated learning across various applications and increasing the efficiency of training a global model through federated learning. As another example, methods and systems are described for ensuring comprehensive training data is available to models assigned by the federated learning server. Additionally, methods and systems are described for improving the rate of training a global model through federated learning.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for preserving user privacy while generating high-quality training data for federated learning models, the system comprising: one or more processors; and a non-transitory computer readable medium having instructions recorded thereon that when executed by the one or more processors cause operations comprising: retrieving a dataset from a user profile, wherein the user profile is stored locally on a user device; processing the dataset, by removing anomalies, incomplete data, or outliers, to generate a feature set locally on the user device; selecting a first feature from the feature set, wherein the first feature is real user data on the local user device to be synthetically generated; inputting the first feature into a synthetic data generation model, wherein the synthetic data generation model generates a first synthetic output based on real user data on the local user device; obfuscating the first synthetic output to generate a first synthetic feature, wherein obfuscation techniques involve replacing personally identifiable information with synthetic data; generating a first synthetic dataset based on the first synthetic feature; directing a user device to train a machine learning model using the first synthetic dataset; and transmitting the machine learning model to a centralized remote server for training a federated learning model. 2. A method for preserving user privacy while generating high-quality training data for federated learning models, the method comprising: retrieving a dataset from a user profile, wherein the user profile is stored locally on a user device; processing the dataset, by removing anomalies, incomplete data, or outliers, to generate a feature set; selecting a first feature from the feature set; inputting the first feature into a synthetic data generation model, wherein the synthetic data generation model generates a first synthetic output; obfuscating the first synthetic output to generate a first synthetic feature; generating a first synthetic dataset based on the first synthetic feature; directing a user device to train a machine learning model using the first synthetic dataset; and transmitting the machine learning model to a centralized remote server from the user device for training a federated learning model. 3. The method of claim 2 , further comprising: generating a second synthetic output based on the first feature; and obfuscating the second synthetic output to generate a second synthetic feature, wherein the first synthetic dataset is further based on the second synthetic feature. 4. The method of claim 2 , further comprising: inputting a second feature into the synthetic data generation model, wherein the synthetic data generation model generates a second synthetic output; and obfuscating the second synthetic output to generate a third synthetic feature, wherein the first synthetic dataset is further based on the third synthetic feature. 5. The method of claim 2 , wherein generating the first synthetic output comprises: determining a plurality of agent outputs based on the first feature; and aggregating the plurality of agent outputs into the first synthetic output. 6. The method of claim 2 , wherein generating the first synthetic output comprises: determining a distribution of data based on the first feature; and determining the first synthetic output based on a likelihood that the distribution of data corresponds to the first synthetic output. 7. The method of claim 4 , wherein generating the first synthetic output comprises: determining, using a first generative model, a first distribution of data based on the first feature; determining, using a second generative model, a second distribution of data based on the first feature; comparing the first distribution to the second distribution; selecting the first distribution based on comparing the first distribution to the second distribution; and determining the first synthetic output based on a likelihood that the first distribution of data corresponds to the first synthetic output. 8. The method of claim 4 , wherein generating the first synthetic output comprises: determining a similar feature based on a manipulation of human language in the first feature; and determining the first synthetic output based on the similar feature. 9. The method of claim 2 , wherein obfuscating the first synthetic output to generate a first synthetic feature comprises: determining a secret key based on a random string of bits; determining an encryption algorithm; and encrypting the first synthetic output by using the secret key and the encryption algorithm to obfuscate the first synthetic output. 10. The method of claim 2 , wherein obfuscating the first synthetic output to generate a first synthetic feature comprises: detecting a first text string in the first synthetic output; determining that the first text string comprises personally identifiable information (PII); and in response to determining that the first text string comprises PII, replacing the first text string with a second text string. 11. The method of claim 10 , wherein determining that the first text string comprises PII comprises: comparing the first text string to a list of known instances of PII corresponding to the user device; and based on comparing the first text string to the list of known instances of PII corresponding to the user device, determining that the first text string corresponds to a first known instance of PII in the list of known instances of PII. 12. The method of claim 10 , wherein determining that the first text string comprises PII comprises: retrieving a PII text string corresponding to the user device; comparing the first text string to the PII text string; and determining that the first text string corresponds to the PII text string. 13. The method of claim 10 , wherein determining that the first text string comprises PII comprises: determining a first characteristic in the first text string; and determining a probability that the first text string corresponds to PII based on the first characteristic. 14. The method of claim 10 , wherein replacing the first text string with the second text string further comprising: determining a first characteristic in the first text string; determining a data format of the first characteristic; selecting the second text string based on the data format; and encrypting the first synthetic output by using a secret key and an encryption algorithm to obfuscate the first synthetic output. 15. The method of claim 2 , wherein obfuscating the first synthetic output to generate a first synthetic feature comprises: detecting a first text string in the first synthetic output; detecting a first character in the first text string; and generating a substitute text string by removing the first character from the first text string. 16. A non-transitory, computer-readable medium comprising instructions recorded thereon that when executed by one or more processors cause operations comprising: retrieving a dataset from a user profile, wherein the user profile is stored locally on a user device; processing the dataset, by removing anomalies, incomplete data, or outliers, to generate a feature set; selecting a first feature from the feature set; inputting the first feature into a synthetic data generation model, wherein the synthetic data generation model generates a first synthetic output; obfuscating the first synthetic output to generate a first synthetic feature; generating a first synthet

Assignees

Inventors

Classifications

  • Distributed learning, e.g. federated learning · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Probabilistic or stochastic networks · CPC title

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12164677B2 cover?
Methods and systems are described for novel uses and/or improvements to federated learning. As one example, methods and systems are described for improving the applicability of federated learning across various applications and increasing the efficiency of training a global model through federated learning. As another example, methods and systems are described for ensuring comprehensive trainin…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F21/64. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).