Classification and non-parametric regression framework with reduction of trained models
US-9501749-B1 · Nov 22, 2016 · US
US11521106B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11521106-B2 |
| Application number | US-201515521441-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 23, 2015 |
| Priority date | Oct 24, 2014 |
| Publication date | Dec 6, 2022 |
| Grant date | Dec 6, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure relates to learning with transformed data such as determining multiple training samples from multiple data samples. Each of the multiple data samples comprises one or more feature values and a label that classifies that data sample. A processor determines each of the multiple training samples by randomly selecting a subset of the multiple data samples, and combining the feature values of the data samples of the subset based on the label of each of the data samples of the subset. Since the training samples are combinations of randomly chosen data samples, the training samples can be provided to third parties without disclosing the actual training data. This is an advantage over existing methods in cases where the data is confidential and should therefore not be shared with a learner of a classifier, for example.
Opening claim text (preview).
The invention claimed is: 1. A computer implemented method for training a classifier that corresponds to confidential data, the method comprising collecting the confidential data as multiple data samples on a user device by way of a user interface, each of the multiple data samples comprising one or more feature values in a vector x i , where i is an index of the data sample, and a label y i that classifies that data sample i; creating non-confidential training data by determining a training sample as a vector of feature values π from the multiple data samples, the training sample π preserving the privacy of the confidential data by: randomly selecting a subset of the multiple data samples by defining a masking variable σ i for each data sample i, the subset comprising more than one data sample, and combining the feature values of the data samples of the subset based on the label of each of the data samples of the subset, by determining a weighted sum of the feature values of the data samples of the subset wherein the feature value of a feature of the training sample is the weighted sum of the feature values of that feature of the data samples of the subset, wherein the weighted sum comprises a sum of feature values of that feature multiplied by the respective labels of each of the data samples of the subset by calculating π=Σ i (σ i +y i )x i , wherein feature j of the training sample is a weighted sum of values of feature j of the data samples: repeating the step of determining the training sample to determine multiple training samples; sending the non-confidential training data including the multiple training samples including the combined feature values to a computer system for determining the classifier weight while maintaining privacy of the confidential data by preventing access to the confidential data from the computer system; training, by the computer system, a classifier that corresponds to the confidential data, without access to the confidential data and with access to the non-confidential training data by calculating, by the computer system, a classifier weight associated with a feature index from the multiple training samples; and classifying, by the computer system using the trained classifier, a test value by determining, by the computer system, a classification of the test values based on the classifier weight. 2. The method of claim 1 , wherein randomly selecting the subset of the multiple data samples comprises multiplying each of the multiple data samples by a random selection value that is unequal to zero to select that data sample or equal to zero to deselect that data sample. 3. The method of claim 1 , wherein determining the sum comprises determining a weighted sum that is weighted based on the number of data samples in the subset of the multiple data samples. 4. The method of claim 1 , wherein the weighted sum is weighted based on a random number such that randomly selecting the subset of the multiple data samples is performed simultaneously with combining the feature values. 5. The method of claim 1 , wherein randomly selecting a subset of multiple data samples comprises randomly selecting a subset of multiple data samples based on a non-uniform distribution. 6. The method of claim 1 , wherein the data samples have signed real values as features values, and the label is one of ‘− 1 ’ and ‘+1’. 7. The method of claim 1 , wherein determining each of the multiple training samples comprises determining each of the multiple training samples such that each of the multiple training samples is based on at least a predetermined number of data samples. 8. The method of claim 7 , wherein randomly selecting a subset of the multiple data samples comprises randomly selecting a subset of the multiple data samples that comprises at least a predetermined number of data samples. 9. The method of claim 1 , wherein determining multiple training samples further comprises: determining for each feature value of the training sample a random value and adding the random value to that feature value to determine a modified training sample. 10. A non-transitory computer readable medium comprising computer-executable instructions stored thereon, that when executed by a processor, causes the processor to perform the method of claim 1 . 11. A system for training a classifier that corresponds to confidential data, the system comprising a data collection device for determining multiple training samples from multiple data samples, and a computer system for receiving and processing the training samples, the data collection device comprising: an input port to receive the multiple data samples; and a processor configured to collect the confidential data as the multiple data samples by way of a user interface, each of the multiple data samples comprising one or more feature values in a vector x i , where i is an index of the data sample, and a label y i that classifies that data sample i; create non-confidential training data by determining a training sample π from the multiple data samples, the training sample π preserving the privacy of the confidential data by randomly selecting a subset of the multiple data samples by defining a masking variable σ i for each data sample i, the subset comprising more than one data sample, and combining the feature values of the data samples of the subset based on the label of each of the data samples of the subset by determining a weighted sum of the feature values of the data samples of the subset wherein the feature value of a feature of the training sample is the weighted sum of the feature values of that feature of the data samples of the subset, wherein the weighted sum comprises a sum of feature values of that feature multiplied by the respective labels of each of the data samples of the subset by calculating π=Σ i (σ i +y i )x i , wherein feature j of the training sample is a weighted sum of values of feature j of the data samples; repeating the step of determining the training sample to determine multiple training samples; and to send the non-confidential training data including the multiple training samples including the combined feature values to the computer system for determining the classifier weight while maintaining privacy of the confidential data by preventing access to the confidential data from the computer system; the computer system comprising a processor configured to: train, by the computer system, a classifier that corresponds to the confidential data without access to the confidential data and with access to the non-confidential training data by calculating a classifier weight associated with a feature index from the multiple training samples, and classify, by the computer system using the trained classifier, a test value by determining a classification of the test values based on the classifier weight. 12. The computer implemented method of claim 1 comprising: receiving, by the computer system, multiple training values associated with a feature index, each training value being based on a combination of a subset of multiple data values that are kept securely on a data collection device by preventing access to the multiple data samples from the computer system, based on multiple data labels, each of the multiple data labels being associated with one of the multiple data values; determining, by the computer system, a correlation value based on the multiple training values, such that the correlation value is indicative of a correlation between each of the multiple data values and the data label associated with that data value; and determining, by the computer system, the classifier coefficient bas
Clustering or classification · CPC title
Machine learning · CPC title
Learning methods · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.