Systems and methods for tracking sensitive data in a big data environment

US10445324B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10445324-B2
Application numberUS-201514944898-A
CountryUS
Kind codeB2
Filing dateNov 18, 2015
Priority dateNov 18, 2015
Publication dateOct 15, 2019
Grant dateOct 15, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may retrieve a pattern from a pattern database with the pattern identifying a type of sensitive data. The system may also retrieve data identified by a variable from a big data management system. The system may then match the data to the pattern to detect the type of sensitive data in the data. An output may be generated in response to the data matching the pattern. A variable access permission may be retrieved for the variable from a permissions repository, a sensitive data permission may be retrieved for the type of sensitive data from the permissions repository, and the variable access permission may be compared to the sensitive data permission to detect a discrepancy.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: retrieving, by a processor, a pattern from a pattern database, wherein the pattern includes a collection of regular expressions written to match sensitive data, wherein the pattern includes a format of a text pattern matching utility, and wherein the pattern identifies a type of the sensitive data; retrieving, by the processor, data from a big data management system, wherein the data is identified by a variable that identifies a collection of the data; matching, by the processor, the data to the pattern; detecting, by the processor and based on the matching, text that matches the data; matching, by the processor, the data to a known list of the sensitive data; determining, by the processor and based on the matching, that a subset of the data is the sensitive data; determining, by the processor, the type of the sensitive data in the subset of the data; logging, by the processor, a location within a file of the subset of the data; retrieving, by the processor, a variable access permission for the variable from a permissions repository; retrieving, by the processor, a sensitive data permission for the type of sensitive data from the permissions repository; comparing, by the processor, the variable access permission to the sensitive data permission to detect a discrepancy; generating, by the processor, an output of the type of sensitive data and the discrepancy; automatically editing, by the processor, the variable access permission to correct the discrepancy; and restricting, by the processor, access to the subset of the data in the big data management system, based on the editing. 2. The method of claim 1 , further comprising scheduling, by the processor, the retrieving the pattern from the pattern database to execute automatically at a predetermined time. 3. The method of claim 2 , wherein the output comprises at least one of a log, an output file, an email, or a message that flags at least one of the data, the variable access permission, or the sensitive data permission for review. 4. The method of claim 3 , further comprising adding, by the processor, a new pattern to the pattern database to identify new types of sensitive data. 5. The method of claim 4 , wherein the pattern further comprises a string. 6. The method of claim 5 , further comprising assigning, by the processor, a priority of execution to the comparing the data to the pattern to run the comparing the data to the pattern as a background process. 7. A computer-based system, comprising: a processor; and a tangible, non-transitory memory configured to communicate with the processor, the tangible, non-transitory memory having instructions stored thereon that, in response to execution by the processor, cause a data scanning system to perform operations comprising: retrieving, by the processor, a pattern from a pattern database, wherein the pattern includes a collection of regular expressions written to match sensitive data, wherein the pattern includes a format of a text pattern matching utility, and wherein the pattern identifies a type of the sensitive data; retrieving, by the processor, data from a big data management system, wherein the data is identified by a variable that identifies a collection of the data; matching, by the processor, the data to the pattern; detecting, by the processor and based on the matching, text that matches the data; matching, by the processor, the data to a known list of the sensitive data; determining, by the processor and based on the matching, that a subset of the data is the sensitive data; determining, by the processor, the type of the sensitive data in the subset of the data; logging, by the processor, a location within a file of the subset of the data; retrieving, by the processor, a variable access permission for the variable from a permissions repository; retrieving, by the processor, a sensitive data permission for the type of sensitive data from the permissions repository; comparing, by the processor, the variable access permission to the sensitive data permission to detect a discrepancy; generating, by the processor, an output of the type of sensitive data and the discrepancy; automatically editing, by the processor, the variable access permission to correct the discrepancy; and restricting, by the processor, access to the subset of the data in the big data management system, based on the editing. 8. The method of claim 1 , further comprising retrieving, by the processor, an action code associated with the pattern. 9. The method of claim 6 , further comprising: retrieving, by the processor, an action code associated with the pattern; and conducting, by the processor, an action associated with the action code. 10. The method of claim 9 , wherein the action code is associated with at least one of quarantining the data, flagging the data for review, locking down a user account lacking permission to view the data, incrementing a counter for a number of matches in a column or modifying permissions to access the data. 11. The computer-based system of claim 7 , wherein the pattern further comprises a string. 12. The computer-based system of claim 7 , further comprising assigning, by the processor, a priority of execution to the comparing the data to the pattern to run the comparing the data to the pattern as a background process. 13. An article of manufacture including a non-transitory, tangible computer readable storage medium having instructions stored thereon that, in response to execution by a processor, cause a data scanning system to perform operations comprising: retrieving, by the processor, a pattern from a pattern database, wherein the pattern includes a collection of regular expressions written to match sensitive data, wherein the pattern includes a format of a text pattern matching utility, and wherein the pattern identifies a type of the sensitive data; retrieving, by the processor, data from a big data management system, wherein the data is identified by a variable that identifies a collection of the data; matching, by the processor, the data to the pattern; detecting, by the processor and based on the matching, text that matches the data; matching, by the processor, the data to a known list of the sensitive data; determining, by the processor and based on the matching, that a subset of the data is the sensitive data; determining, by the processor, the type of the sensitive data in the subset of the data; logging, by the processor, a location within a file of the subset of the data; retrieving, by the processor, a variable access permission for the variable from a permissions repository; retrieving, by the processor, a sensitive data permission for the type of sensitive data from the permissions repository; comparing, by the processor, the variable access permission to the sensitive data permission to detect a discrepancy; generating, by the processor, an output of the type of sensitive data and the discrepancy; automatically editing, by the processor, the variable access permission to correct the discrepancy; and restricting, by the processor, access to the subset of the data in the big data management system, based on the editing. 14. The article of claim 13 , further comprising scheduling, by the processor, the retrieving the pattern from the pattern database to execute automatically at a predetermined time. 15. The article of claim 13 , wherein the output comprises at least one of a log, an output file, an email, or a message that flags at least one of the data, the variable acces

Assignees

Inventors

Classifications

  • with adaptation to user needs · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10445324B2 cover?
A system may retrieve a pattern from a pattern database with the pattern identifying a type of sensitive data. The system may also retrieve data identified by a variable from a big data management system. The system may then match the data to the pattern to detect the type of sensitive data in the data. An output may be generated in response to the data matching the pattern. A variable access p…
Who is the assignee on this patent?
American Express Travel Related Services Co Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2457. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 15 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).