Sensitive data classification

US10810317B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10810317-B2
Application numberUS-201815892802-A
CountryUS
Kind codeB2
Filing dateFeb 9, 2018
Priority dateFeb 13, 2017
Publication dateOct 20, 2020
Grant dateOct 20, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A gateway device includes a network interface connected to data sources, and computer instructions, that when executed cause a processor to access data portions from the data sources. The processor accesses classification rules, which are configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the rule. Each rule is associated with a significance factor representative of an accuracy of the classification rule. The processor applies each of the set of classification rules to a data portion to obtain an output of whether the data is sensitive data. The output are weighed by significance factors to produce a set of weighted outputs. The processor determines if the data portion is sensitive data by aggregating the set of weighted outputs, and presents the determination in a user interface. Security operations may also be performed on the data portion.

First claim

Opening claim text (preview).

What is claimed is: 1. A gateway device, comprising: a network interface communicatively coupled with a plurality of data sources; a hardware processor; and a non-transitory computer readable storage medium storing computer readable instructions, that when executed by the hardware processor, cause the hardware processor to: access data from one or more of the plurality of data sources, the accessed data comprising a plurality of data portions; access a set of classification rules, each of the set of classification rules configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the classification rule, and each of the set of classification rules further associated with a significance factor representative of an accuracy of the classification rule in classifying data portions as sensitive, wherein the significance factor associated with a classification rule is based on 1) a type of sensitive data that the classification rule is configured to detect and 2) an expected rate of false positives associated with the type of sensitive data; apply each of the set of classification rules to a data portion to obtain an output representative of whether the data portion is sensitive data; weigh the output from each application of a classification rule by the significance factor associated with the classification rule to produce a set of weighted outputs; determine if the data portion is sensitive by aggregating the set of weighted outputs; in response to determining that the data portion is sensitive, modify a user interface presented to a user to indicate that the data portion is determined to be sensitive and presenting a set of security operations that can be taken in response to the determination that the data portion is sensitive; and in response to a selection of a presented security operation, perform the security operation to reduce a security risk associated with the data portion. 2. The device of claim 1 , wherein a data portion is at least one of: a cell in a table, a non-delimited string of characters, and a file. 3. The device of claim 1 , wherein sensitive data is at least one of: an address component, a date of birth, a telephone number, an email address, a social security number, a financial account number, a password, and a username. 4. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches a pre-defined pattern associated with the classification rule. 5. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when a one or more data parsing rules associated with the classification rule, when applied to the data portion, return a true value. 6. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when a one or more contextual data requirements specified by the classification rule are satisfied by one or both of the data portion and associated data sources of the plurality of data source. 7. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches an entry in a reference table specified by the classification rule. 8. The device of claim 1 , wherein a classification rule of the set of classification rules is satisfied when a trained machine learning model associated with the classification rule returns a score for the data portion beyond a threshold value, the score computed by the machine learning model based on one or more features extracted from the data portion and used as input for the machine learning model. 9. The device of claim 1 , wherein the one or more security operations include at least one of encryption, tokenization, and obfuscation, and wherein the one or more security operations that are performed are selected based on a desired security level for the data portion. 10. The device of claim 1 , wherein the significance factor associated with a classification rule is further based on an accuracy of the classification rule determined based on a number of false positives generated by the classification rule when applied to a training data set. 11. A computer-implemented method, comprising: accessing data from one or more of a plurality of data sources, the accessed data comprising a plurality of data portions; accessing a set of classification rules, each of the set of classification rules configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the classification rule, and each of the set of classification rules further associated with a significance factor representative of an accuracy of the classification rule in classifying data portions as sensitive wherein the significance factor associated with a classification rule is based on 1) a type of sensitive data that the classification rule is configured to detect and 2) an expected rate of false positives associated with the type of sensitive data; applying each of the set of classification rules to a data portion to obtain an output representative of whether the data portion is sensitive data; weighting the output from each application of a classification rule by the significance factor associated with the classification rule to produce a set of weighted outputs; determining if the data portion is sensitive by aggregating the set of weighted outputs; and in response to determining that the data portion is sensitive, performing one or more security operations on the data portion to reduce a security risk associated with the data portion. 12. The method of claim 11 , wherein a data portion is at least one of: a cell in a table, a non-delimited string of characters, and a file. 13. The method of claim 11 , wherein sensitive data is at least one of: an address component, a date of birth, a telephone number, an email address, a social security number, a financial account number, a password, and a username. 14. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches a pre-defined pattern associated with the classification rule. 15. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when a one or more data parsing rules associated with the classification rule, when applied to the data portion, return a true value. 16. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when a one or more contextual data requirements specified by the classification rule are satisfied by one or both of the data portion and associated data sources of the plurality of data source. 17. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when the data portion matches an entry in a reference table specified by the classification rule. 18. The method of claim 11 , wherein a classification rule of the set of classification rules is satisfied when a trained machine learning model associated with the classification rule returns a score for the data portion beyond a threshold value, the score computed by the machine learning model based on one or more features extracted from the data portion and used as input for the machine learning model. 19. The method of claim 11 , wherein the one or more security operations include at least one of encryption, tokenization, and obfuscation, and

Assignees

Inventors

Classifications

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • G06F21/604Primary

    Tools and structures for managing or administering access control systems · CPC title

  • Clustering or classification · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10810317B2 cover?
A gateway device includes a network interface connected to data sources, and computer instructions, that when executed cause a processor to access data portions from the data sources. The processor accesses classification rules, which are configured to classify a data portion of the plurality of data portions as sensitive data in response to the data portion satisfying the rule. Each rule is as…
Who is the assignee on this patent?
Protegrity Corp
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 20 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).