Rule-based document scrubbing of sensitive data

US11449635B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11449635-B2
Application numberUS-201916408143-A
CountryUS
Kind codeB2
Filing dateMay 9, 2019
Priority dateMay 16, 2018
Publication dateSep 20, 2022
Grant dateSep 20, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A rule-based attribution mechanism analyzes documents having different types of data in different formats through the application of script-based rules that apply a tag to the document identifying the type of sensitive data that is contained in the document. Documents having similar tags are aggregated so that the sensitive data is scrubbed from the document leaving the telemetric data available for downstream processing. The scrubbing entails different actions, such as, eliminating the sensitive data, obfuscating the sensitive data, and converting the sensitive data into a non-sensitive value.

First claim

Opening claim text (preview).

What is claimed: 1. A method, including: receiving a current document at a computing device, the computing device having at least one processor communicatively coupled to a memory, the current document containing telemetric data and unscrubbed sensitive data; applying a tag to the current document to denote an attribute that identifies a scrubbing action to be performed to the unscrubbed sensitive data, the tag based on a field in the current document satisfying a rule and condition for being classified as sensitive data; generating a first obfuscated value for the unscrubbed sensitive data; searching a table of obfuscated values for the first obfuscated value; upon the applied tag identifying a rolling hash scrubbing action and upon a search of the table finding the first obfuscated value, replacing the unscrubbed sensitive data in the current document with a second obfuscated value, the first obfuscated value differs from the second obfuscated value; and analyzing the telemetric data without the unscrubbed sensitive data. 2. The method of claim 1 , further comprising: upon the applied tag identifying a field in the current document as an identifier associated with a software product, replacing the identifier with a one-way hash. 3. The method of claim 1 , further comprising: upon the applied tag identifying a field in the current document as a geolocation, converting the field to a non-sensitive location. 4. The method of claim 1 , further comprising: upon the applied tag identifying a field in the current document as an IP address, converting the field to a name of a service provider. 5. The method of claim 1 , further comprising: upon the applied tag identifying a field in the current document as an email address, removing the email address from the current document. 6. The method of claim 1 , further comprising: upon the applied tag identifying a field in the current document as a machine name, removing the machine name from the current document. 7. The method of claim 1 , further comprising: upon the applied tag identifying a field in the current document as a project identifier, obfuscating the value of the project identifier with a hashed value. 8. The method of claim 1 , further comprising: upon the applied tag identifying a field in the current document as a correlation identifier, obfuscating the value of the correlation identifier with a hashed value. 9. The method of claim 1 , wherein the sensitive data that was previously replaced includes a MAC address hash. 10. A system, comprising: a processor and a memory; wherein the memory includes instructions that when executed on the processor perform acts that: obtain a current document including telemetric data and unscrubbed sensitive data; apply a tag to the current document to denote an attribute that identifies a scrubbing action to be performed to the unscrubbed sensitive data, the tag based on a field in the current document satisfying a rule and condition for being classified as sensitive data; generate a first obfuscated value for the unscrubbed sensitive data; search a table of obfuscated values for the first obfuscated value; upon the applied tag identifying a rolling hash scrubbing action and upon a search of the table finding the first obfuscated value, replace the unscrubbed sensitive data in the current document with a second obfuscated value, the first obfuscated value differs from the second obfuscated value; and analyze the telemetric data without the unscrubbed sensitive data. 11. The system of claim 10 , wherein the memory includes further instructions that when executed on the processor perform acts that: upon the applied tag identifying a field in the current document as an identifier associated with a software product, replace the identifier with a one-way hash. 12. The system of claim 10 , wherein the memory includes further instructions that when executed on the processor perform acts that: upon the applied tag identifying a field in the current document as a geolocation, convert the field to a non-sensitive location. 13. The system of claim 10 , wherein the memory includes further instructions that when executed on the processor perform acts that: upon the applied tag identifying a field in the current document as an IP address, convert the field to a name of a service provider. 14. The system of claim 10 , wherein the memory includes further instructions that when executed on the processor perform acts that: upon the applied tag identifying a field in the current document as an email address, remove the email address from the current document. 15. The system of claim 10 , wherein the memory includes further instructions that when executed on the processor perform acts that: upon the applied tag identifying a field in the current document as a machine name, remove the machine name from the current document. 16. The system of claim 10 , wherein the memory includes further instructions that when executed on the processor perform acts that: upon the applied tag identifying a field in the current document as a project identifier, obfuscate the value of the project identifier with a hashed value. 17. The system of claim 10 , wherein the memory includes further instructions that when executed on the processor perform acts that: upon the applied tag identifying a field in the current document as a correlation identifier, obfuscate the value of the correlation identifier with a hashed value. 18. The system of claim 10 , wherein the sensitive data that was previously replaced includes a MAC address hash.

Assignees

Inventors

Classifications

  • Hash functions, e.g. MD5, SHA, HMAC or f9 MAC · CPC title

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Anonymous communication, i.e. the party's identifiers are hidden from the other party or parties, e.g. using an anonymizer · CPC title

  • Providing cryptographic facilities or services · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11449635B2 cover?
A rule-based attribution mechanism analyzes documents having different types of data in different formats through the application of script-based rules that apply a tag to the document identifying the type of sensitive data that is contained in the document. Documents having similar tags are aggregated so that the sensitive data is scrubbed from the document leaving the telemetric data availabl…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F21/6254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).