User data deidentification system for ip addresses
US-2024411929-A1 · Dec 12, 2024 · US
US2025390605A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025390605-A1 |
| Application number | US-202519316864-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 2, 2025 |
| Priority date | Aug 21, 2023 |
| Publication date | Dec 25, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for novel uses and/or improvements to data labeling applications, particularly data labeling applications involving sensitive data. As one example, systems and methods are described herein for preventing sensitive data leakage, using weak learner libraries, during label propagation.
Opening claim text (preview).
What is claimed is: 1 . A system for preventing sensitive data leakage, using weak learner libraries and a plurality of environments, during label propagation, the system comprising: one or more processors; and one or more non-transitory, computer-readable mediums comprising instructions that when executed by the one or more processors causes operations comprising: receiving a first data set at a first environment, wherein the first data set comprises a plurality of sensitive characteristics, wherein the first data set comprises actual data; generating a second data set at a second environment, wherein the second data set is a synthetic data set corresponding to the first data set; determining, based on the second data set at the second environment, a first learner for a first labeling task of a plurality of labeling tasks specific to the first data set; validating, based on the first data set at the first environment, the first learner; in response to validating the first learner at the first environment, adding the first learner to a first learner library for the first data set; determining, based on the second data set at the second environment, a second learner for a second labeling task of the plurality of labeling tasks, wherein the second learner has a second learning capability; validating, based on the first data set at the first environment, the second learner; adding the second learner to the first learner library in response to validating the second learner; determining, for the first learner library, an aggregate labeling performance for the plurality of labeling tasks specific to the first data set; comparing the aggregate labeling performance to a threshold aggregate performance; and determining whether to approve the first learner library for use based on comparing the aggregate labeling performance to the threshold aggregate performance. 2 . A method for preventing sensitive data leakage, using weak learner libraries, during label propagation, the method comprising: receiving a first data set, wherein the first data set comprises a plurality of sensitive characteristics; generating a second data set, wherein the second data set is a synthetic data set corresponding to the first data set; determining, based on the second data set, a first learner for a first labeling task, wherein the first labeling task is specific to the first data set, and wherein the first learner has a first learning capability; validating, based on the first data set, the first learner; and in response to validating the first learner, adding the first learner to a first learner library for the first data set. 3 . The method of claim 2 , wherein generating the second data set further comprises: retrieving a first latent representation of a first characteristic from the first data set; comparing the first latent representation to characteristics of the second data set to determine whether first sensitive data of the first data set has been leaked; and determining whether to approve the second data set for use based on whether first sensitive data of the first data set has been leaked. 4 . The method of claim 2 , wherein determining the first learner for the first labeling task further comprises: retrieving a second characteristic from the first data set; comparing the second characteristic to characteristics of the first learner to determine whether second sensitive data of the first data set has been leaked; and determining whether to approve the first learner for use based on whether the second sensitive data of the first data set has been leaked. 5 . The method of claim 2 , wherein validating, based on the first data set, the first learner further comprises: determining, for the first learner, a labeling performance of the first labeling task; comparing the labeling performance to a threshold performance; and determining whether to approve the first learner for use based on comparing the labeling performance to the threshold performance. 6 . The method of claim 2 , further comprising: generating for display a recommendation related to the additional weak learner to the first learner library by: determining, based on the second data set, a second learner for a second labeling task, wherein the second learner has a second learning capability; validating, based on the first data set, the second learner; and recommending adding the second learner to the first learner library in response to validating the second learner. 7 . The method of claim 2 , wherein validating the first learner library further comprises: determining, for the first learner library, an aggregate labeling performance for a plurality of labeling tasks specific to the first data set; comparing the aggregate labeling performance to a threshold aggregate performance; and determining whether to approve the first learner library for use based on comparing the aggregate labeling performance to the threshold aggregate performance. 8 . The method of claim 2 , wherein validating the first learner library further comprises: determining a first weight for the first learner; determining a second weight for a second learner in the first learner library; and determining, based on the first weight and the second weight, an aggregate labeling performance for a plurality of labeling tasks specific to the first data set. 9 . The method of claim 8 , wherein determining the first weight for the first learner further comprises: determining, for the first learner, a labeling performance of the first labeling task; and determining the first weight based on the labeling performance. 10 . The method of claim 2 , wherein generating the second data set further comprises: determining a statistical property of the first data set; and generating the synthetic data set for the second data set using a random number generator and the statistical property. 11 . The method of claim 2 , wherein generating the second data set further comprises: determining the first data set is tabular data; and in response to determining that the first data set is tabular data, selecting a first interpolation algorithm for generating the second data set. 12 . The method of claim 2 , wherein generating the second data set further comprises: determining a correlation structure of the first data set; and determining the synthetic data set for the second data set using a copula model and the correlation structure. 13 . The method of claim 2 , wherein determining, based on the second data set, the first learner for the first labeling task further comprises: determining a third characteristic from the second data set; determining an importance of the third characteristic; and selecting the third characteristic as a feature for the first learner based on the importance. 14 . The method of claim 13 , wherein selecting the third characteristic as the feature for the first learner based on the importance further comprises: determining a first value for a first classification in the first labeling task; determining a second value for a second classification in the first labeling task; and determining a threshold value for the feature based on maximizing a difference between the first value and the second value. 15 . The method of claim 14 , wherein determining the threshold value for the feature based on maximizing the difference between the first value and the second value further comprises: determining a first classification error for the first classification; and further determining the first value based o
Protecting personal data, e.g. for financial or medical purposes · CPC title
by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.