Anonymizing user identifiable information

US9910902B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9910902-B1
Application numberUS-201313774828-A
CountryUS
Kind codeB1
Filing dateFeb 22, 2013
Priority dateFeb 22, 2013
Publication dateMar 6, 2018
Grant dateMar 6, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed techniques provide systems and methods for anonymizing various portions of information, action logs, end-user information, and/or other data sets that are stored in non-indexed storage systems. More specifically, various anonymization procedures are described for redacting UII and/or replacing UII in raw data with randomly generated information (RGI). The anonymization process is performed on a rolling basis as raw data is received. An anonymization mapping table maps (or associates) the replaced UII in the anonymized data to the RGI, and eventually all raw data can be deleted.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by a computing system, comprising: identifying, by the computing system, a non-indexed raw data set that is not indexed based on user identifiable information (UII) from computer memory in a data warehouse, wherein the raw data set meets an anonymization criteria and includes one or more instances of the UII; generating randomly generated information (RGI) to be associated with the UII, wherein the RGI is generated to be independent of the UII; associating, by the computing system, the UII with the RGI in an anonymization identification map; generating, by the computing system, an anonymized data set using the anonymization identification map, wherein the anonymized data set is an anonymized version of the raw data set, the generating the anonymized data set includes: determining that a portion of the non-indexed raw data set has a specified data structure, identifying a key in the non-indexed raw data set that is associated with a primary UII associated with the specified data structure, and replacing a value of the key with the RGI associated with the primary UII; receiving, by the computing system, an indication to delete an account associated with a user of a social networking system; identifying, by the computing system, a user identifier (UID) associated with the user, the UID being a specified UII of the user; and disassociating, by the computing system in the anonymization identification map, the specified UII from a specified RGI associated with the specified UII to delete the account associated with the user. 2. The method of claim 1 , further comprising: storing, by the computing system, the anonymized data set in the data warehouse. 3. The method of claim 1 , wherein generating the anonymized data set using the anonymization identification map comprises: replacing, by the computing system, at least one of the one or more instances of UII in the raw data set with an associated RGI. 4. The method of claim 1 , wherein the UII comprises user identifiers (UIDs) and the RGI comprises randomly generated user identifiers (RIDs). 5. The method of claim 4 , wherein each UID uniquely identifies a user of a social networking system. 6. The method of claim 1 , wherein generating the anonymized data set using the anonymization identification map comprises: scanning, by the computing system, the raw data set to identify a complex structure; determining, by the computing system, a primary UII associated with the complex structure; parsing, by the computing system, the complex structure to identify a key associated with the UII; identifying, by the computing system, a value associated with the key; and replacing, by the computing system, the value with RGI associated with the primary UII. 7. The method of claim 6 , further comprising: determining, by the computing system, that the key is another complex structure; parsing, by the computing system, the key to identify an additional key if a max depth threshold is not exceeded; identifying, by the computing system, an additional value associated with the additional key; and replacing, by the computing system, the additional value with RGI associated with the primary UII. 8. The method of claim 1 , wherein generating the anonymized data set using the anonymization identification map comprises: identifying, by the computing system, a type of data in a column of the raw data set based on a metadata tag associated with the column; determining, by the computing system, an action associated with the metadata tag; and performing, by the computing system, the action to anonymize the data in the column. 9. The method of claim 8 , wherein performing the action to anonymize the data in the column comprises replacing the one or more instances of UII in the column with an associated RGI. 10. The method of claim 8 , wherein performing the action to anonymize the data in the column comprises executing a computer script to sanitize the data. 11. The method of claim 1 , wherein the non-indexed raw data includes a plurality of tables and the raw data set meets the anonymization criteria if one or more of the plurality of tables meet or exceed a first age as determined from a date of origination in the data warehouse. 12. The method of claim 1 , further comprising: removing, by the computing system, the raw data set at a second time subsequent to a first time, wherein the anonymized data set is generated at the first time. 13. The method of claim 1 , further comprising maintaining, by the computing system, the anonymization identification map. 14. The method of claim 13 , wherein maintaining the anonymization identification map comprises: accessing, by the computing system, a new data set upon occurrence of a triggering event; scanning, by the computing system, the new data set for instances of UII including a list of one or more scanned UIDs; accessing, by the computing system, a list of active UIDs, wherein an active UID is associated with a corresponding RID in the anonymization identification map; comparing, by the computing system, the list of scanned UIDs to the list of active UIDs to identify a list of new UIDs, wherein new UIDs are included in the list of scanned UIDs but not the list of active UIDs; generating, by the computing system, an RID for each of the new UIDs; associating, by the computing system, each generated RID with the corresponding new UID; and adding, by the computing system, the new UIDs to the list of active UIDs. 15. The method of claim 1 , wherein generating the anonymized data set is initiated upon occurrence of a triggering event. 16. A system, comprising: a processor; a memory storing instructions, which when executed by the processor causes the processor to: access a non-indexed raw data set that is not indexed based on user identifiable information (UII) from a data warehouse and an anonymization identification map, wherein the non-indexed raw data set meets an anonymization criteria and includes one or more instances of UII and the anonymization identification map associates the UII with randomly generated information (RGI), wherein the RGI is generated to be independent of the UII; process the anonymization identification map and generate an anonymized data set using the anonymization identification map, wherein the anonymized data set is an anonymized version of the raw data set, wherein the anonymized data set is generated by: determining that a portion of the non-indexed raw data set has a specified data structure, identifying a key in the non-indexed raw data set that is associated with a primary UII associated with the specified data structure, and replacing a value of the key with the RGI associated with the primary UII; receive an indication to delete an account associated with a user of a social networking system; identify a user identifier (UID) associated with the user, the UID being a specified UII of the user; and disassociate, in the anonymization identification map, the specified UII from a specified RGI associated with the specified UII to delete the account associated with the user. 17. A non-transitory computer-readable storage medium storing computer-readable instructions, which when executed by a processor, causes the processor to perform a method comprising: identifying a non-indexed raw data set that is not indexed based on user identifiable information (UII) from computer memory in a data warehouse, wherein the raw data set meets an anonymization criteria and includes one or more instan

Assignees

Inventors

Classifications

  • Management thereof · CPC title

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Usage protection of distributed data files · CPC title

  • Business processes related to social networking or social networking services · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9910902B1 cover?
The disclosed techniques provide systems and methods for anonymizing various portions of information, action logs, end-user information, and/or other data sets that are stored in non-indexed storage systems. More specifically, various anonymization procedures are described for redacting UII and/or replacing UII in raw data with randomly generated information (RGI). The anonymization process is …
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/6254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 06 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).