Policy based data aggregation
US-10333901-B1 · Jun 25, 2019 · US
US10810317B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10810317-B2 |
| Application number | US-201815892802-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 9, 2018 |
| Priority date | Feb 13, 2017 |
| Publication date | Oct 20, 2020 |
| Grant date | Oct 20, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A gateway device includes a network interface connected to data sources, and computer instructions, that when executed cause a processor to access data portions from the data sources. The processor accesses classification rules, which are configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the rule. Each rule is associated with a significance factor representative of an accuracy of the classification rule. The processor applies each of the set of classification rules to a data portion to obtain an output of whether the data is sensitive data. The output are weighed by significance factors to produce a set of weighted outputs. The processor determines if the data portion is sensitive data by aggregating the set of weighted outputs, and presents the determination in a user interface. Security operations may also be performed on the data portion.
Opening claim text (preview).
What is claimed is: 1. A gateway device, comprising: a network interface communicatively coupled with a plurality of data sources; a hardware processor; and a non-transitory computer readable storage medium storing computer readable instructions, that when executed by the hardware processor, cause the hardware processor to: access data from one or more of the plurality of data sources, the accessed data comprising a plurality of data portions; access a set of classification rules, each of the set of classification rules configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the classification rule, and each of the set of classification rules further associated with a significance factor representative of an accuracy of the classification rule in classifying data portions as sensitive, wherein the significance factor associated with a classification rule is based on 1) a type of sensitive data that the classification rule is configured to detect and 2) an expected rate of false positives associated with the type of sensitive data; apply each of the set of classification rules to a data portion to obtain an output representative of whether the data portion is sensitive data; weigh the output from each application of a classification rule by the significance factor associated with the classification rule to produce a set of weighted outputs; determine if the data portion is sensitive by aggregating the set of weighted outputs; in response to determining that the data portion is sensitive, modify a user interface presented to a user to indicate that the data portion is determined to be sensitive and presenting a set of security operations that can be taken in response to the determination that the data portion is sensitive; and in response to a selection of a presented security operation, perform the security operation to reduce a security risk associated with the data portion. 2. The device of claim 1 , wherein a data portion is at least one of: a cell in a table, a non-delimited string of characters, and a file. 3. The device of claim 1 , wherein sensitive data is at least one of: an address component, a date of birth, a telephone number, an email address, a social security number, a financial account number, a password, and a username. 4. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches a pre-defined pattern associated with the classification rule. 5. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when a one or more data parsing rules associated with the classification rule, when applied to the data portion, return a true value. 6. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when a one or more contextual data requirements specified by the classification rule are satisfied by one or both of the data portion and associated data sources of the plurality of data source. 7. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches an entry in a reference table specified by the classification rule. 8. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when a trained machine learning model associated with the classification rule returns a score for the data portion beyond a threshold value, the score computed by the machine learning model based on one or more features extracted from the data portion and used as input for the machine learning model. 9. The device of claim 1 , wherein the one or more security operations include at least one of encryption, tokenization, and obfuscation, and wherein the one or more security operations that are performed are selected based on a desired security level for the data portion. 10. The device of claim 1 , wherein the significance factor associated with a classification rule is further based on an accuracy of the classification rule determined based on a number of false positives generated by the classification rule when applied to a training data set. 11. A computer-implemented method, comprising: accessing data from one or more of a plurality of data sources, the accessed data comprising a plurality of data portions; accessing a set of classification rules, each of the set of classification rules configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the classification rule, and each of the set of classification rules further associated with a significance factor representative of an accuracy of the classification rule in classifying data portions as sensitive wherein the significance factor associated with a classification rule is based on 1) a type of sensitive data that the classification rule is configured to detect and 2) an expected rate of false positives associated with the type of sensitive data; applying each of the set of classification rules to a data portion to obtain an output representative of whether the data portion is sensitive data; weighting the output from each application of a classification rule by the significance factor associated with the classification rule to produce a set of weighted outputs; determining if the data portion is sensitive by aggregating the set of weighted outputs; and in response to determining that the data portion is sensitive, performing one or more security operations on the data portion to reduce a security risk associated with the data portion. 12. The method of claim 11 , wherein a data portion is at least one of: a cell in a table, a non-delimited string of characters, and a file. 13. The method of claim 11 , wherein sensitive data is at least one of: an address component, a date of birth, a telephone number, an email address, a social security number, a financial account number, a password, and a username. 14. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches a pre-defined pattern associated with the classification rule. 15. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when a one or more data parsing rules associated with the classification rule, when applied to the data portion, return a true value. 16. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when a one or more contextual data requirements specified by the classification rule are satisfied by one or both of the data portion and associated data sources of the plurality of data source. 17. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches an entry in a reference table specified by the classification rule. 18. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when a trained machine learning model associated with the classification rule returns a score for the data portion beyond a threshold value, the score computed by the machine learning model based on one or more features extracted from the data portion and used as input for the machine learning model. 19. The method of claim 11 , wherein the one or more security operations include at least one of encryption, tokenization, and obfuscation, and
by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title
Machine learning · CPC title
Tools and structures for managing or administering access control systems · CPC title
Clustering or classification · CPC title
Protecting personal data, e.g. for financial or medical purposes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.