Attack detection and prevention using global device fingerprinting
US-9106693-B2 · Aug 11, 2015 · US
US12014253B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12014253-B2 |
| Application number | US-202217967147-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 17, 2022 |
| Priority date | Dec 12, 2013 |
| Publication date | Jun 18, 2024 |
| Grant date | Jun 18, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for constructing sets of synthetic data. A single data record is identified from a first set of data. The first set of data comprises a first plurality of data records, each of the data records including multiple items of data describing an entity. Using pattern recognition, the single data record is processed to identify a group of records from within the first set that have corresponding characteristics equivalent to the single data record. The identified group of records comprises a target set of variables and the group of records from the first set that are not identified comprises a control set of variables. The target set of variables and the control set of variables are processed, using probability estimation and optimization constraints, to determine a score for each of the records in the first set. The score describes how similar each of the records in the first set is to the single data record. The records associated with a percentage of the highest scores are identified. The data associated with the single data record is replaced with data associated with the identified records identified, item-by-item.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method performed by one or more processors, the computer-implemented method comprising: identifying a single data record from a first set of data records, each of the data records of the first set of data records including fields to store variables, respectively, describing an entity, at least one of the variables being associated with personal information; using pattern recognition, processing the single data record and identifying a group of records from within the first set of data records that have a target set of variables corresponding to the variables in the single data record, wherein a second group of the data records from the first set of data records that are not identified gave a control set of variables that are different than the variables of the single data record; determining scores for the data records of the first set of data records based on the variables of the data records of the first set of data records, the target set of variables, and the control set of variables, the scores corresponding to comparisons of the data records of the first set of data records, respectively, to the single data record; identifying ones of the data records having scores that are greater than a threshold; and replacing data in the single data record that is representative of the personal information with data associated with one or more of the ones of the data records having scores that are greater than the threshold under constraints of (a) maintaining one or more statistical characteristics of the fields and (b) removing the personal information; and training a predictive model using the ones of the data records having scores that are greater than the threshold, wherein, once trained, the predictive model generates a synthetic dataset that describes an original dataset without a possibility of matching an entry of the synthetic dataset back to the original dataset. 2. The computer-implemented method of claim 1 wherein determining the scores includes determining the scores using probability estimation. 3. The computer-implemented method of claim 1 wherein determining the scores further includes determining the scores using optimization constraints. 4. The computer-implemented method of claim 1 wherein identifying the ones of the data records having scores that are greater than a threshold includes identifying a percentage of the data records having a predetermined percentage of the highest scores that are greater than the threshold. 5. The computer-implemented method of claim 1 wherein the variables include at least age, gender, income, credit limit information. 6. The computer-implemented method of claim 1 wherein the synthetic dataset satisfies predetermined statistical characteristics relative to the original dataset. 7. The computer-implemented method of claim 1 wherein the original dataset includes data regarding financial information of users and the synthetic dataset includes data regarding insurance information for users. 8. The computer-implemented method of claim 1 further comprising obtaining the first set of data records from a second set of data records that includes a greater number of data records than the first set of data records. 9. The computer-implemented method of claim 8 wherein obtaining the first set of data records from the second set of data records includes removing ones of the data records of the second set of data records that include a variable that is different than a mean of values of the variable of the second set of data records. 10. The computer-implemented method of claim 8 wherein obtaining the first set of data records from the second set of data records includes removing ones of the data records of the second set of data records that include a variable that is at least a predetermined number of standard deviations from a mean of the values of the variable of the second set of data records. 11. A system comprising: one or more processors; and memory including instructions that, when executed by the one or more processors, perform to: identify a single data record from a first set of data records, each of the data records of the first set of data records including fields to store variables, respectively, describing an entity, at least one of the variables being associated with personal information; using pattern recognition, process the single data record and identify a group of records from within the first set of data records that have a target set of variables corresponding to the variables in the single data record, wherein a second group of the data records from the first set of data records that are not identified gave a control set of variables that are different than the variables of the single data record; determine scores for the data records of the first set of data records based on the variables of the data records of the first set of data records, the target set of variables, and the control set of variables, the scores corresponding to comparisons of the data records of the first set of data records, respectively, to the single data record; identify ones of the data records having scores that are greater than a threshold; and replace data in the single data record that is representative of the personal information with data associated with one or more of the ones of the data records having scores that are greater than the threshold under constraints of (a) maintaining one or more statistical characteristics of the fields and (b) removing the personal information; and train a predictive model using the ones of the data records having scores that are greater than the threshold, wherein, once trained, the predictive model generates a synthetic dataset that describes an original dataset without a possibility of matching an entry of the synthetic dataset back to the original dataset. 12. The system of claim 11 wherein the instructions include instructions that, when executed by the one or more processors, perform to determine the scores includes determining the scores using probability estimation and optimization constraints. 13. The system of claim 11 wherein the instructions include instructions that, when executed by the one or more processors, perform to identify the ones of the data records having scores that are greater than a threshold by identifying a percentage of the data records having a predetermined percentage of the highest scores that are greater than the threshold. 14. The system of claim 11 wherein the synthetic dataset satisfies predetermined statistical characteristics relative to the original dataset. 15. The system of claim 11 wherein the original dataset includes data regarding financial information of users and the synthetic dataset includes data regarding insurance information for users. 16. The system of claim 11 wherein the instructions further include instructions that, when executed by the one or more processors, perform to obtain the first set of data records from a second set of data records that includes a greater number of data records than the first set of data records. 17. The system of claim 16 wherein the instructions include instructions that, when executed by the one or more processors, perform to obtain the first set of data records from the second set of data records by removing ones of the data records of the second set of data records that include a variable that is different than a mean of the values of the variable of the second set of data records. 18. The system of claim 16 wherein the instructi
Related publications grouped by family.
Answers are generated from the same data shown on this page.