Generating anonymous data from web data

US9866454B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9866454-B2
Application numberUS-201414224482-A
CountryUS
Kind codeB2
Filing dateMar 25, 2014
Priority dateMar 25, 2014
Publication dateJan 9, 2018
Grant dateJan 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device receives web data, associated with user devices, that is generated based on interactions of the user devices with a network and one or more content provider devices. The device removes erroneous or objectionable web data from the web data to generate a subset of the web data, and categorizes the subset of the web data by assigning categories to the subset of the web data. The device performs an empirical estimation of the categorized subset of the web data to generate empirical estimations. The device performs a simulation of the empirical estimations to generate synthetic data that corresponds to the web data and removes private information relating to the user devices and users of the user devices, and stores the synthetic data in a storage device.

First claim

Opening claim text (preview).

What is claimed is: 1. A method to provide synthetic data, when statistical properties and privacy information, associated with web data, are preserved in the synthetic data, comprising: receiving, by a device and from user devices, the web data, the web data being associated with the user devices, the web data being generated based on interactions of the user devices with one or more content provider devices via a network, and the web data including one or more of: clickstream data that includes information associated with portions of content, provided by the one or more content provider devices, that are selected via the user devices, location data that includes information associated with locations of the user devices when the content is accessed by the user devices, time data that includes information associated with times when the user devices access the content, or network data that includes information associated with network resources utilized by the user devices to access the content; removing, by the device, erroneous or objectionable web data from the web data to generate a subset of the web data; categorizing, by the device, the subset of the web data by assigning categories to the subset of the web data; performing, by the device, an empirical estimation of the categorized subset of the web data to generate empirical estimations that include information that provides a representation of behaviors associated with users of the user devices; receiving, by the device, a selection of an anonymity level associated with generating the synthetic data; performing, by the device, a simulation of the empirical estimations to generate the synthetic data, the synthetic data including information associated with the empirical estimations, and the synthetic data removing private information, relating to the user devices and the users of the user devices, in accordance to the anonymity level; determining, by the device, whether the statistical properties and the privacy information, associated with the web data, are preserved in the synthetic data; and selectively: storing, by the device, the synthetic data in a storage device and providing the synthetic data when the statistical properties and the privacy information, associated with the web data, are preserved in the synthetic data, or re-performing, by the device, the simulation of the empirical estimations to generate other synthetic data when the statistical properties or the privacy information, associated with the web data, is not preserved in the synthetic data. 2. The method of claim 1 , further comprising: presenting, for display, the synthetic data to a device associated with at least one of the one or more content provider devices. 3. The method of claim 1 , where the synthetic data includes the statistical properties, associated with the web data, without the private information from the web data. 4. The method of claim 1 , where the simulation of the empirical estimations includes a Monte Carlo simulation of the empirical estimations. 5. The method of claim 1 , where the empirical estimation of the categorized subset of the web data includes an empirical estimation of joint distributions of the categorized subset of the web data. 6. A device for providing synthetic data, when statistical properties and privacy information, associated with web data, are preserved in the synthetic data, comprising: one or more processors to: receive, from user devices, the web data, the web data being generated based on interactions of the user devices with a plurality of content provider devices via a network, and the web data including one or more of: clickstream data that includes information associated with portions of content, provided by the plurality of content provider devices, that are selected via the user devices, location data that includes information associated with locations of the user devices when the content is accessed by the user devices, time data that includes information associated with times when the user devices access the content, or network data that includes information associated with network resources utilized by the user devices to access the content; remove erroneous or objectionable web data from the web data to generate a subset of the web data; categorize the subset of the web data by assigning categories to the subset of the web data; perform an empirical estimation of the categorized subset of the web data to generate empirical estimations that include information that provides a representation of behaviors associated with users of the user devices; receive preference information for an anonymity level associated with generating synthetic data; perform a simulation of the empirical estimations to generate the synthetic data, the synthetic data including properties of the empirical estimations, and the synthetic data removing private information, relating to the user devices and the users of the user devices, in accordance with the preference information; determine whether the statistical properties and the privacy information, associated with the web data, are preserved in the synthetic data; and selectively: store the synthetic data in a storage device, and provide the synthetic data when the statistical properties and the privacy information, associated with the web data, are preserved in the synthetic data, or re-perform the simulation of the empirical estimations to generate other synthetic data when the statistical properties or the privacy information, associated with the web data, is not preserved in the synthetic data. 7. The device of claim 6 , where, when providing the synthetic data, the one or more processors are to: present, for display, the synthetic data to a user of the device or to a particular device associated with the plurality of content provider devices. 8. The device of claim 6 , where the synthetic data includes statistical properties, associated with the web data, without the private information from the web data. 9. The device of claim 6 , where the simulation of the empirical estimations includes a Monte Carlo simulation of the empirical estimations. 10. The device of claim 6 , where the empirical estimation of the categorized subset of the web data includes an empirical estimation of joint distributions of the categorized subset of the web data. 11. A computer-readable medium for storing instructions, the instructions comprising: one or more instructions that, when executed by one or more processors of a device for providing synthetic data, when statistical properties and privacy information, associated with web data, are preserved in the synthetic data, cause the one or more processors to: receive, from user devices, the web data, the web data being generated based on interactions of the user devices with one or more content provider devices via a network, the web data including private information regarding at least one of the user devices or one or more users of the user devices, and the web data including at least one of: clickstream data that includes information associated with portions of content, provided by the one or more content provider devices, that are selected via the user devices, location data that includes information associated with locations of the user devices when the content is accessed by the user devices, time data that includes information associated with times when the user devices access the content, or network data that includes information associated with network resources utilized by the user devices to access the content; remove erroneous or objectionable web data from the web data to generat

Assignees

Inventors

Classifications

  • H04L43/04Primary

    Processing captured monitoring data, e.g. for logfile generation · CPC title

  • Electricity · mapped topic

  • H04L67/535Primary

    Tracking the activity of the user (network monitoring arrangements H04L43/00; recording of computer activity G06F11/34) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9866454B2 cover?
A device receives web data, associated with user devices, that is generated based on interactions of the user devices with a network and one or more content provider devices. The device removes erroneous or objectionable web data from the web data to generate a subset of the web data, and categorizes the subset of the web data by assigning categories to the subset of the web data. The device pe…
Who is the assignee on this patent?
Verizon Patent & Licensing Inc
What technology area does this patent fall under?
Primary CPC classification H04L43/04. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jan 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).