Building annotated models based on eyes-off data

US12353580B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12353580-B2
Application numberUS-202217972548-A
CountryUS
Kind codeB2
Filing dateOct 24, 2022
Priority dateOct 24, 2022
Publication dateJul 8, 2025
Grant dateJul 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are directed to building annotated models based on eyes-off data. Specifically, a synthetic data generation model is trained and used to further train a target model. The synthetic data generation model is trained within an eyes-off environment using an anonymity technique on confidential data. The synthetic data generation model is then used to create synthetic data that closely represents the confidential data but without any specific details that can be linked back to the confidential data. The synthetic data is then annotated and used to train the target model within an eyes-on environment. Subsequently, the target model is deployed back within the eyes-off environment to classify the confidential data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: importing, by a first environment in which users do not have access rights to confidential information in a second environment, a synthetic data generation model, the synthetic data generation model being machine-trained using an anonymity technique on confidential data received and stored within the second environment that comprises the confidential information; generating synthetic data using the synthetic data generation model, the synthetic data comprising data that is an equivalent of the confidential data without any specific details that can be linked back to the confidential data; machine-training, in the first environment, a target model using annotated versions of the synthetic data as training data, the target model trained to classify the confidential data in the second environment; exporting, by a component of the first environment, the target model back to the second environment; and causing deployment of the target model within the second environment to classify the confidential data. 2. The method of claim 1 , wherein the generating the synthetic data using the synthetic data generation model occurs in the first eyes on environment. 3. The method of claim 1 , further comprising: receiving annotations of the synthetic data prior to the training of the target model. 4. The method of claim 1 , further comprising: training a final model within the second environment based on the classified confidential data generated by the target model; and deploying the final model within the second environment. 5. The method of claim 1 , further comprising: machine-training the synthetic data generation model in the second environment using the anonymity technique. 6. The method of claim 5 , wherein the anonymity technique used to machine-train the synthetic data generation model comprises differential privacy. 7. The method of claim 5 , wherein the anonymity technique used to machine-train the synthetic data generation model comprises K user anonymity. 8. The method of claim 5 , wherein the anonymity technique used to machine-train the synthetic data generation model comprises personally identifiable information scrubbing. 9. The method of claim 1 , further comprising: based on updated confidential data in the second environment, retraining the synthetic data generation model. 10. The method of claim 9 , further comprising: generating new synthetic data using the retrained synthetic data generation model; and retraining the target model. 11. A system comprising: one or more hardware processors; and a memory storing instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform operations comprising: importing, by a first environment in which users do not have access rights to confidential information in a second environment, a synthetic data generation model, the synthetic data generation model being machine-trained using an anonymity technique on confidential data received and stored within the second environment that comprises the confidential information; generating synthetic data using the synthetic data generation model, the synthetic data comprising data that is an equivalent of the confidential data without any specific details that can be linked back to the confidential data; machine-training, in the first environment, a target model using annotated versions of the synthetic data as training data, the target model trained to classify the confidential data in the second environment; and causing deployment of the target model within the second environment to classify the confidential data. 12. The system of claim 11 , wherein the generating the synthetic data using the synthetic data generation model occurs in the first environment. 13. The system of claim 11 , wherein the operations further comprise: receiving annotations of the synthetic data prior to the training of the target model. 14. The system of claim 11 , wherein the operations further comprise: training a final model within the second environment based on the classified confidential data generated by the target model; and deploying the final model within the second environment. 15. The system of claim 11 , wherein the operations further comprise: machine-training the synthetic data generation model in the second environment using the anonymity technique. 16. The system of claim 11 , wherein the operations further comprise: based on updated confidential data in the second environment, retraining the synthetic data generation model. 17. The system of claim 16 , wherein the operations further comprise: generating new synthetic data using the retrained synthetic data generation model; and retraining the target model. 18. A machine-storage medium comprising instructions which, when executed by one or more hardware processors of a machine, cause the machine to perform operations comprising: importing, by a first environment in which users do not have access rights to confidential information in a second environment, a synthetic data generation model, the synthetic data generation model being machine-trained using an anonymity technique on confidential data received and stored within the second environment that comprises the confidential information; generating synthetic data using the synthetic data generation model, the synthetic data comprising data that is an equivalent of the confidential data without any specific details that can be linked back to the confidential data; machine-training, in the first environment, a target model using annotated versions of the synthetic data as training data, the target model trained to classify the confidential data in the second environment; exporting, by a component of the first environment, the target model back to the second environment; and causing deployment of the target model within the second environment to classify the confidential data. 19. The storage medium of claim 18 , wherein the generating the synthetic data using the synthetic data generation model occurs in the first environment. 20. The storage medium of claim 18 , wherein the operations further comprise: training a final model within the second environment based on the classified confidential data generated by the target model; and deploying the final model within the second environment.

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • to a system of files or objects, e.g. local or distributed file system or database · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12353580B2 cover?
Systems and methods are directed to building annotated models based on eyes-off data. Specifically, a synthetic data generation model is trained and used to further train a target model. The synthetic data generation model is trained within an eyes-off environment using an anonymity technique on confidential data. The synthetic data generation model is then used to create synthetic data that cl…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F21/6218. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).