Method and apparatus for generating synthetic data
US-2022116199-A1 · Apr 14, 2022 · US
US12353580B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12353580-B2 |
| Application number | US-202217972548-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 24, 2022 |
| Priority date | Oct 24, 2022 |
| Publication date | Jul 8, 2025 |
| Grant date | Jul 8, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are directed to building annotated models based on eyes-off data. Specifically, a synthetic data generation model is trained and used to further train a target model. The synthetic data generation model is trained within an eyes-off environment using an anonymity technique on confidential data. The synthetic data generation model is then used to create synthetic data that closely represents the confidential data but without any specific details that can be linked back to the confidential data. The synthetic data is then annotated and used to train the target model within an eyes-on environment. Subsequently, the target model is deployed back within the eyes-off environment to classify the confidential data.
Opening claim text (preview).
What is claimed is: 1. A method comprising: importing, by a first environment in which users do not have access rights to confidential information in a second environment, a synthetic data generation model, the synthetic data generation model being machine-trained using an anonymity technique on confidential data received and stored within the second environment that comprises the confidential information; generating synthetic data using the synthetic data generation model, the synthetic data comprising data that is an equivalent of the confidential data without any specific details that can be linked back to the confidential data; machine-training, in the first environment, a target model using annotated versions of the synthetic data as training data, the target model trained to classify the confidential data in the second environment; exporting, by a component of the first environment, the target model back to the second environment; and causing deployment of the target model within the second environment to classify the confidential data. 2. The method of claim 1 , wherein the generating the synthetic data using the synthetic data generation model occurs in the first eyes on environment. 3. The method of claim 1 , further comprising: receiving annotations of the synthetic data prior to the training of the target model. 4. The method of claim 1 , further comprising: training a final model within the second environment based on the classified confidential data generated by the target model; and deploying the final model within the second environment. 5. The method of claim 1 , further comprising: machine-training the synthetic data generation model in the second environment using the anonymity technique. 6. The method of claim 5 , wherein the anonymity technique used to machine-train the synthetic data generation model comprises differential privacy. 7. The method of claim 5 , wherein the anonymity technique used to machine-train the synthetic data generation model comprises K user anonymity. 8. The method of claim 5 , wherein the anonymity technique used to machine-train the synthetic data generation model comprises personally identifiable information scrubbing. 9. The method of claim 1 , further comprising: based on updated confidential data in the second environment, retraining the synthetic data generation model. 10. The method of claim 9 , further comprising: generating new synthetic data using the retrained synthetic data generation model; and retraining the target model. 11. A system comprising: one or more hardware processors; and a memory storing instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform operations comprising: importing, by a first environment in which users do not have access rights to confidential information in a second environment, a synthetic data generation model, the synthetic data generation model being machine-trained using an anonymity technique on confidential data received and stored within the second environment that comprises the confidential information; generating synthetic data using the synthetic data generation model, the synthetic data comprising data that is an equivalent of the confidential data without any specific details that can be linked back to the confidential data; machine-training, in the first environment, a target model using annotated versions of the synthetic data as training data, the target model trained to classify the confidential data in the second environment; and causing deployment of the target model within the second environment to classify the confidential data. 12. The system of claim 11 , wherein the generating the synthetic data using the synthetic data generation model occurs in the first environment. 13. The system of claim 11 , wherein the operations further comprise: receiving annotations of the synthetic data prior to the training of the target model. 14. The system of claim 11 , wherein the operations further comprise: training a final model within the second environment based on the classified confidential data generated by the target model; and deploying the final model within the second environment. 15. The system of claim 11 , wherein the operations further comprise: machine-training the synthetic data generation model in the second environment using the anonymity technique. 16. The system of claim 11 , wherein the operations further comprise: based on updated confidential data in the second environment, retraining the synthetic data generation model. 17. The system of claim 16 , wherein the operations further comprise: generating new synthetic data using the retrained synthetic data generation model; and retraining the target model. 18. A machine-storage medium comprising instructions which, when executed by one or more hardware processors of a machine, cause the machine to perform operations comprising: importing, by a first environment in which users do not have access rights to confidential information in a second environment, a synthetic data generation model, the synthetic data generation model being machine-trained using an anonymity technique on confidential data received and stored within the second environment that comprises the confidential information; generating synthetic data using the synthetic data generation model, the synthetic data comprising data that is an equivalent of the confidential data without any specific details that can be linked back to the confidential data; machine-training, in the first environment, a target model using annotated versions of the synthetic data as training data, the target model trained to classify the confidential data in the second environment; exporting, by a component of the first environment, the target model back to the second environment; and causing deployment of the target model within the second environment to classify the confidential data. 19. The storage medium of claim 18 , wherein the generating the synthetic data using the synthetic data generation model occurs in the first environment. 20. The storage medium of claim 18 , wherein the operations further comprise: training a final model within the second environment based on the classified confidential data generated by the target model; and deploying the final model within the second environment.
Machine learning · CPC title
by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title
to a system of files or objects, e.g. local or distributed file system or database · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.