Rule-based document scrubbing of sensitive data

US11775684B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11775684-B2
Application numberUS-202217888908-A
CountryUS
Kind codeB2
Filing dateAug 16, 2022
Priority dateMay 16, 2018
Publication dateOct 3, 2023
Grant dateOct 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A rule-based attribution mechanism analyzes documents having different types of data in different formats through the application of script-based rules that apply a tag to the document identifying the type of sensitive data that is contained in the document. Documents having similar tags are aggregated so that the sensitive data is scrubbed from the document leaving the telemetric data available for downstream processing. The scrubbing entails different actions, such as, eliminating the sensitive data, obfuscating the sensitive data, and converting the sensitive data into a non-sensitive value.

First claim

Opening claim text (preview).

What is claimed: 1. A system comprising: one or more processors; and a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs include instructions that: obtain a plurality of documents having telemetric data and sensitive data, a first set of the plurality of documents having a plurality of fields arranged in a first format, a second set of the plurality of documents having a plurality of fields arranged in a second format, wherein the first format and the second format differ; access a script having a plurality of rules, a rule identifying a select one of the plurality of fields of a select one of the plurality of documents of a specific format as sensitive data and including a scrubbing action; apply the plurality of rules of the script to each of the plurality of documents to identify sensitive data and to associate a scrubbing action for the identified sensitive data; tag each of the plurality of documents with a tag indicating the scrubbing action from the application of the plurality of rules; aggregate select ones of the plurality of documents tagged with a similar tag; perform a select scrubbing action associated with the similar tag to each of the selected aggregated documents; and process the telemetric data without the sensitive data. 2. The system of claim 1 , wherein the telemetric data includes an event field that identifies an event that triggered collection of the telemetric data; and wherein at least one of the plurality of rules identifies the sensitive data based on the event field. 3. The system of claim 1 , wherein the telemetric data includes a condition that specifies circumstances in which the tag is applied; and wherein at least one of the plurality of rules identifies the sensitive data based on the condition being satisfied. 4. The system of claim 1 , wherein the scrubbing action deletes the identified sensitive data. 5. The system of claim 1 , wherein the scrubbing action obfuscates the identified sensitive data using a simple hash value. 6. The system of claim 1 , wherein the scrubbing action converts the identified sensitive data into a non-sensitive value. 7. The system of claim 1 , wherein the scrubbing action obfuscates the identified sensitive data using a rolling hash value. 8. The system of claim 1 , wherein the telemetric data is collected during engagement of a software product. 9. A computer-implemented method, comprising: accessing a plurality of documents including telemetric data generated from events occurring during execution of one or more software products, wherein the telemetric data includes a plurality of fields, a select one of the fields containing an event triggering collection of the telemetric data; obtaining a rule-based script having a plurality of rules, a rule identifying sensitive data in at least one field of the plurality of fields of the plurality of documents and a scrubbing action for the identified sensitive data; applying the rule-based script to each of the plurality of documents to identify fields containing the sensitive data; tagging select ones of the plurality of documents with a tag based on the applied rule-based script, wherein the tag identifies a scrubbing action; aggregating the selected ones of the plurality of documents having a common tag; performing the scrubbing action of the common tag to the sensitive data; and processing the telemetric data without the scrubbed sensitive data. 10. The computer-implemented method of claim 9 , wherein the scrubbing action deletes the identified sensitive data. 11. The computer-implemented method of claim 9 , wherein the scrubbing action obfuscates the identified sensitive data using a simple hash value. 12. The computer-implemented method of claim 9 , wherein the scrubbing action converts the identified sensitive data into a non-sensitive value. 13. The computer-implemented method of claim 9 , wherein the scrubbing action obfuscates the identified sensitive data using a rolling hash value. 14. The computer-implemented method of claim 9 , wherein at least one of the plurality of rules identifies the sensitive data based on the event field. 15. The computer-implemented method of claim 9 , wherein at least one of the plurality of rules identifies the sensitive data based on a condition in the telemetric data being satisfied.

Assignees

Inventors

Classifications

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • Hash functions, e.g. MD5, SHA, HMAC or f9 MAC · CPC title

  • Anonymous communication, i.e. the party's identifiers are hidden from the other party or parties, e.g. using an anonymizer · CPC title

  • Providing cryptographic facilities or services · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11775684B2 cover?
A rule-based attribution mechanism analyzes documents having different types of data in different formats through the application of script-based rules that apply a tag to the document identifying the type of sensitive data that is contained in the document. Documents having similar tags are aggregated so that the sensitive data is scrubbed from the document leaving the telemetric data availabl…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F21/6254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).