Data detection using intelligent sampling

US12536323B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12536323-B2
Application numberUS-202318467484-A
CountryUS
Kind codeB2
Filing dateSep 14, 2023
Priority dateSep 14, 2023
Publication dateJan 27, 2026
Grant dateJan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method includes determining that a specific type of information is to be identified in a set of data. The method further includes sampling the set of data according to various sampling criteria to identify the specified type of information. The sampling criteria include at least a recency criterion indicating that the data to be sampled has been updated within a specified timeframe and a lineage criterion indicating that the data to be sampled is within a maximum hierarchical distance from a source data structure. The method also includes identifying, from the data that was sampled according to the sampling criteria, one or more data structures that include the specified type of information. The method further includes applying security policies to the identified data structures based on the type of information that was identified in the set of data. Various other methods, systems, and computer-readable media are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: determining that a specific type of information is to be identified in a set of data comprising a hierarchical structure; sampling the set of data according to one or more sampling criteria to identify the specified type of information, the sampling criteria including at least: a recency criterion specifying a timeframe, in which data has been updated, from which the data is to be sampled; and a lineage criterion specifying a maximum hierarchical distance, from a source data structure within the hierarchical structure, that the data is to be sampled within; from the data that was sampled according to the sampling criteria, identifying one or more data structures that include the specified type of information; and applying, to the identified data structures, one or more security policies that transform the identified data structures from a less secure state to a more secure state. 2 . The computer-implemented method of claim 1 , wherein the data structures that include the specified type of information are further classified according to one or more data classification rules. 3 . The computer-implemented method of claim 2 , wherein the data classification rules further define which data structures qualify as including the specified type of information. 4 . The computer-implemented method of claim 2 , wherein the data classification rules filter the data structures that include the specified type of information into one or more groups that include subtypes of the specified type of information. 5 . The computer-implemented method of claim 2 , wherein one or more of the data classification rules are defined by a user. 6 . The computer-implemented method of claim 1 , wherein the data set is randomly sampled according to at least the recency criterion and the lineage criterion until a statistically significant number of samples have been taken from the set of data. 7 . The computer-implemented method of claim 1 , wherein the one or more security policies comprise a policy to: encrypt the identified data structures; restrict access to the identified data structures; relocate the identified data structures; apply a label to the identified data structures; or quarantine the identified data structures. 8 . The computer-implemented method of claim 1 , wherein the lineage criterion indicates a relative importance of sampling the set of data. 9 . The computer-implemented method of claim 8 , wherein data that is hierarchically closer to the source data structure has a higher relative importance, and wherein data that is hierarchically further from the source data structure has a lower relative importance. 10 . The computer-implemented method of claim 1 , further comprising providing a recommendation to an owner or manager of the identified data structures indicating which data structures are identified as including the specified type of information. 11 . The computer-implemented method of claim 1 , wherein sampling is avoided for datasets that are outside of the specified timeframe. 12 . The computer-implemented method of claim 1 , wherein the specified type of information comprises personally identifiable information. 13 . A system comprising: at least one physical processor; and physical memory comprising computer-executable instructions that, when executed by the physical processor, cause the physical processor to: determine that a specific type of information is to be identified in a set of data comprising a hierarchical structure; sample the set of data according to one or more sampling criteria to identify the specified type of information, the sampling criteria including at least: a recency criterion specifying a timeframe, in which data has been updated, from which the data is to be sampled; and a lineage criterion specifying a maximum hierarchical distance, from a source data structure within the hierarchical structure, that the data is to be sampled within; from the data that was sampled according to the sampling criteria, identify one or more data structures that include the specified type of information; and apply, to the identified data structures, one or more security policies that transform the identified data structures from a less secure state to a more secure state. 14 . The system of claim 13 , wherein the lineage criterion is given higher weighting during the sampling, such that source data structures are prioritized when performing the sampling. 15 . The system of claim 13 , wherein identifying the one or more data structures that include the specified type of information comprises identifying at least one new subtype of the specified type of information. 16 . The system of claim 15 , wherein the at least one new subtype of the specified type of information is implemented as feedback when identifying other instances of the specified type of information. 17 . The system of claim 16 , wherein the feedback includes a mapping between the at least one newly identified subtype and the sampled data. 18 . The system of claim 17 , wherein one or more classification rules are automatically generated based on the mapping. 19 . The system of claim 18 , wherein the automatically generated classification rules are refined over time as new subtypes of the specified type of information are identified in the set of data or in other sets of data. 20 . A non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to: determine that a specific type of information is to be identified in a set of data comprising a hierarchical structure; sample the set of data according to one or more sampling criteria to identify the specified type of information, the sampling criteria including at least: a recency criterion specifying a timeframe, in which data has been updated, from which the data is to be sampled; and a lineage criterion specifying a maximum hierarchical distance, from a source data structure within the hierarchical structure, that the data is to be sampled within; from the data that was sampled according to the sampling criteria, identify one or more data structures that include the specified type of information; and apply, to the identified data structures, one or more security policies that transform the identified data structures from a less secure state to a more secure state.

Assignees

Inventors

Classifications

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • Extracting rules from data · CPC title

  • Updating · CPC title

  • Protecting data · CPC title

  • Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536323B2 cover?
A computer-implemented method includes determining that a specific type of information is to be identified in a set of data. The method further includes sampling the set of data according to various sampling criteria to identify the specified type of information. The sampling criteria include at least a recency criterion indicating that the data to be sampled has been updated within a specified…
Who is the assignee on this patent?
Netflix Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).