Scrubbe to remove personally identifiable information

US9582680B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9582680-B2
Application numberUS-201414168532-A
CountryUS
Kind codeB2
Filing dateJan 30, 2014
Priority dateJan 30, 2014
Publication dateFeb 28, 2017
Grant dateFeb 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A personally identifiable information (PII) scrubbing system. The PII scrubbing system surgically scrubs PII form a log based on a scrubber configuration corresponding to the log. The scrubber configuration includes context information about locations and types of PII in the log and rules specifying how to locate and protect the PII. Scrubber configurations are quickly and easily created or modified as scrubbing requirements change or new scenarios are encountered. The flexibility provided by the scrubber configurations allows only the PII to be scrubbed, even from unstructured data, without having to include surrounding data. Many consumers can use the scrubbed data without needed to expose the PII because less non-personal data is obscured. Surgical scrubbing also retains the usefulness of the underlying PII even while protecting the PII. Consumers can correlate the protected PII to locate specific information without having to expose additional PII.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of scrubbing a data set having messages containing both non-personal data and personally identifiable information, the method comprising: loading a message containing both non-personal data and personally identifiable information; loading a scrubber configuration containing a rule set for scrubbing the data set; parsing the message into fields based on the rule set, wherein unstructured data fields are formatted and delimiters are added to the unstructured data field such that personally identifiable information is identifiable from unlabeled data; scrubbing only the personally identifiable information in the message based on the rule set to produce a scrubbed message, the personally identifiable information being associated with metadata that identifies a type of personally identifiable information, and applying a corresponding scrubbing rule to the type of personally identifiable information, the corresponding scrubbing rule including: generating replacement values for the personally identifiable information in the message based on the rule set, including generating a replacement value for a first instance of specific personally identifiable information in the message based on the corresponding scrubbing rule and storing a reference to the replacement value associated with the specific personally identifiable information; and substituting replacement values for the personally identifiable information in the message to create the scrubbed message, including retrieving the replacement value associated with the specific personally identifiable information using the reference when additional instances of the specific personally identifiable information are encountered, and using the retrieved replacement value for the additional instances of the specific personally identifiable information; and saving the scrubbed message. 2. The method of claim 1 wherein the rule set comprises a root parsing rule and child rules for scrubbing the data set. 3. The method of claim 2 wherein the act of scrubbing the personally identifiable information in the message based on the rule set to produce a scrubbed message further comprises: parsing the message into fields based on the root parsing rule; and scrubbing the personally identifiable information in selected fields of the message based on the child rules. 4. The method of claim 3 wherein the act of parsing the message into fields based on the root parsing rule further comprises splitting the message into fields based on a delimiter specified in the root parsing rule. 5. The method of claim 3 wherein the act of parsing the message into fields based on the root parsing rule further comprises splitting the message into a predefined set of fields based on a message type specified in the root parsing rule. 6. The method of claim 3 wherein the act of protecting the personally identifiable information in selected fields of the message based on the child rules further comprises the act of applying a filtering rule specified in the child rules to include messages having fields containing personally identifiable information based on a value of a selected field. 7. The method of claim 3 wherein the act of protecting the personally identifiable information in selected fields of the message based on the child rules further comprises the act of applying a filtering rule specified in the child rules to exclude messages not having any fields containing personally identifiable information based on a value of a selected field. 8. The method of claim 3 wherein the act of protecting the personally identifiable information in selected fields of the message based on the child rules further comprises the act of applying a processing rule specified in the child rules to protect personally identifiable information in a selected field specified in the processing rule. 9. The method of claim 8 wherein the act of applying a processing rule specified in the child rules to protect personally identifiable information in a selected field specified in the processing rule further comprises: applying a parsing rule specified in the child rules to search the selected field for personally identifiable information of a type specified the parsing rule; and protecting the personally identifiable information of the specified type found in the selected field. 10. The method of claim 3 wherein the act of protecting the personally identifiable information in selected fields of the message based on the child rules further comprises the act of applying a parsing rule specified in the child rules to separate a selected field specified in the processing rule into sub-fields. 11. The method of claim 10 wherein the act of protecting the personally identifiable information in selected fields of the message based on the child rules further comprises the act of: separating the value of a field into name fields and value fields based on a delimiter pair specified in the child rules; and protecting the personally identifiable information in the value field if found in the selected field; applying a parsing rule specified in the child rules to separate the selected field specified in the parsing rule into sub-fields. 12. The method of claim 1 wherein the act of scrubbing only the personally identifiable information in the message based on the rule set to produce a scrubbed message further comprising: storing a replacement value for each unique instance of personally identifiable information in the message based on the rules from the scrubber configuration; and re-using the replacement value for when duplicate instances of the personally identifiable information. 13. A system for scrubbing personally identifiable information from a message, the system comprising: a processing unit; and a memory including computer executable instructions which, when executed by a processing unit, cause the system to provide: a scrubber configuration including a root parsing rule and a processing rule specifying how to locate and replace the personally identifiable information appearing in the message, the scrubber configuration corresponding to a log containing messages; a scrubbing agent loading the scrubber configuration, the scrubber agent comprising a parsing engine executing the root parsing to separate the message into fields, wherein unstructured data fields are formatted and delimiters are added to the unstructured data field such that personally identifiable information is identifiable from unlabeled data, and a processing engine executing the processing rule to replace the personally identifiable information in a selected field with a replacement value preventing the personally identifiable information from being exposed but allowing specific personally identifiable information to be located by correlation, the personally identifiable information being associated with metadata that identifies a type of personally identifiable information, and applying a corresponding scrubbing rule to the type of personally identifiable information, the corresponding scrubbing rule generating replacement values for the personally identifiable information in the message based on the rule set by generating a replacement value for a first instance of specific personally identifiable information in the message based on the corresponding scrubbing rule and storing a reference to the replacement value associated with the specific personally identifiable information, and substituting replacement values for the personally identifiable information in the message to create the scrubbed message by retrieving the replacement value associated with the specific p

Assignees

Inventors

Classifications

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • Search customisation based on user profiles and personalisation · CPC title

  • G06F21/552Primary

    involving long-term monitoring or reporting · CPC title

  • Applying rules; Deductive queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9582680B2 cover?
A personally identifiable information (PII) scrubbing system. The PII scrubbing system surgically scrubs PII form a log based on a scrubber configuration corresponding to the log. The scrubber configuration includes context information about locations and types of PII in the log and rules specifying how to locate and protect the PII. Scrubber configurations are quickly and easily created or mod…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F21/6254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).