Scanning for information according to scan objectives
US-2021374163-A1 · Dec 2, 2021 · US
US12536323B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12536323-B2 |
| Application number | US-202318467484-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 14, 2023 |
| Priority date | Sep 14, 2023 |
| Publication date | Jan 27, 2026 |
| Grant date | Jan 27, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method includes determining that a specific type of information is to be identified in a set of data. The method further includes sampling the set of data according to various sampling criteria to identify the specified type of information. The sampling criteria include at least a recency criterion indicating that the data to be sampled has been updated within a specified timeframe and a lineage criterion indicating that the data to be sampled is within a maximum hierarchical distance from a source data structure. The method also includes identifying, from the data that was sampled according to the sampling criteria, one or more data structures that include the specified type of information. The method further includes applying security policies to the identified data structures based on the type of information that was identified in the set of data. Various other methods, systems, and computer-readable media are also disclosed.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method comprising: determining that a specific type of information is to be identified in a set of data comprising a hierarchical structure; sampling the set of data according to one or more sampling criteria to identify the specified type of information, the sampling criteria including at least: a recency criterion specifying a timeframe, in which data has been updated, from which the data is to be sampled; and a lineage criterion specifying a maximum hierarchical distance, from a source data structure within the hierarchical structure, that the data is to be sampled within; from the data that was sampled according to the sampling criteria, identifying one or more data structures that include the specified type of information; and applying, to the identified data structures, one or more security policies that transform the identified data structures from a less secure state to a more secure state. 2 . The computer-implemented method of claim 1 , wherein the data structures that include the specified type of information are further classified according to one or more data classification rules. 3 . The computer-implemented method of claim 2 , wherein the data classification rules further define which data structures qualify as including the specified type of information. 4 . The computer-implemented method of claim 2 , wherein the data classification rules filter the data structures that include the specified type of information into one or more groups that include subtypes of the specified type of information. 5 . The computer-implemented method of claim 2 , wherein one or more of the data classification rules are defined by a user. 6 . The computer-implemented method of claim 1 , wherein the data set is randomly sampled according to at least the recency criterion and the lineage criterion until a statistically significant number of samples have been taken from the set of data. 7 . The computer-implemented method of claim 1 , wherein the one or more security policies comprise a policy to: encrypt the identified data structures; restrict access to the identified data structures; relocate the identified data structures; apply a label to the identified data structures; or quarantine the identified data structures. 8 . The computer-implemented method of claim 1 , wherein the lineage criterion indicates a relative importance of sampling the set of data. 9 . The computer-implemented method of claim 8 , wherein data that is hierarchically closer to the source data structure has a higher relative importance, and wherein data that is hierarchically further from the source data structure has a lower relative importance. 10 . The computer-implemented method of claim 1 , further comprising providing a recommendation to an owner or manager of the identified data structures indicating which data structures are identified as including the specified type of information. 11 . The computer-implemented method of claim 1 , wherein sampling is avoided for datasets that are outside of the specified timeframe. 12 . The computer-implemented method of claim 1 , wherein the specified type of information comprises personally identifiable information. 13 . A system comprising: at least one physical processor; and physical memory comprising computer-executable instructions that, when executed by the physical processor, cause the physical processor to: determine that a specific type of information is to be identified in a set of data comprising a hierarchical structure; sample the set of data according to one or more sampling criteria to identify the specified type of information, the sampling criteria including at least: a recency criterion specifying a timeframe, in which data has been updated, from which the data is to be sampled; and a lineage criterion specifying a maximum hierarchical distance, from a source data structure within the hierarchical structure, that the data is to be sampled within; from the data that was sampled according to the sampling criteria, identify one or more data structures that include the specified type of information; and apply, to the identified data structures, one or more security policies that transform the identified data structures from a less secure state to a more secure state. 14 . The system of claim 13 , wherein the lineage criterion is given higher weighting during the sampling, such that source data structures are prioritized when performing the sampling. 15 . The system of claim 13 , wherein identifying the one or more data structures that include the specified type of information comprises identifying at least one new subtype of the specified type of information. 16 . The system of claim 15 , wherein the at least one new subtype of the specified type of information is implemented as feedback when identifying other instances of the specified type of information. 17 . The system of claim 16 , wherein the feedback includes a mapping between the at least one newly identified subtype and the sampled data. 18 . The system of claim 17 , wherein one or more classification rules are automatically generated based on the mapping. 19 . The system of claim 18 , wherein the automatically generated classification rules are refined over time as new subtypes of the specified type of information are identified in the set of data or in other sets of data. 20 . A non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to: determine that a specific type of information is to be identified in a set of data comprising a hierarchical structure; sample the set of data according to one or more sampling criteria to identify the specified type of information, the sampling criteria including at least: a recency criterion specifying a timeframe, in which data has been updated, from which the data is to be sampled; and a lineage criterion specifying a maximum hierarchical distance, from a source data structure within the hierarchical structure, that the data is to be sampled within; from the data that was sampled according to the sampling criteria, identify one or more data structures that include the specified type of information; and apply, to the identified data structures, one or more security policies that transform the identified data structures from a less secure state to a more secure state.
Protecting personal data, e.g. for financial or medical purposes · CPC title
Extracting rules from data · CPC title
Updating · CPC title
Protecting data · CPC title
Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.