System and method of reducing data in a storage system

US2016232159A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016232159-A1
Application numberUS-201514616975-A
CountryUS
Kind codeA1
Filing dateFeb 9, 2015
Priority dateFeb 9, 2015
Publication dateAug 11, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The system and method of the present disclosure relates to technology for reducing the amount of data stored in a storage system by processing subsets of data stored in data sources using advanced analytics. The process generally includes extracting data from data sources for analysis by ranking the data, marking the data, identifying pattern changes in the data, comparing pattern changes in the data and purging and/or masking the data for storage. The system also includes databases for storing and defining rules, patterns, policies and classification data to be applied to the data from the data sources and analytics to apply the rules, patterns, policies and classification information on the data. As a result, the data stored in the data sources is reduced, and processing efficiency is increased.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of reducing data in a storage system, comprising: accessing the data stored in the storage system by a processor; parsing the data accessed from the storage system into subsets of data using the processor, the parsing comprising categorizing the subsets of data using key identifiers, each of the categorical subsets of data analyzed based on a rule set associated with a respective category for each of the subsets of data; for each of the analyzed subsets of data, using the processor to detect the subsets of data to be purged based on a threshold condition having been satisfied, and ranking the subsets of data for which the threshold condition has been satisfied, and detect the subsets of data to be masked based on a policy having been satisfied, and ranking the subsets of data for which the policy has been satisfied; individually marking the subsets of data based on the ranking for purging using the processor when the threshold condition has been satisfied, and individually marking the subsets of data for masking based on the ranking using the processor when the policy has been satisfied; identifying pattern changes using the processor between the subsets of data prior to analysis and the marked subsets of data for purging and between the subsets of data prior to analysis and the marked subsets of data for masking; and processing the subsets of data for permanent change by reducing the amount of data using the processor when pattern changes satisfying a predetermined criteria have been identified, and providing the permanently changed subsets of data with the reduced amount of data to the storage system for storage. 2 . The method of claim 1 , further comprising: during the categorization of each of the subsets of data, applying the key identifiers by the processor to identify characteristics of the content in each of the subsets of data that are associated with a specific category; and during the analysis of the categorized subsets of data, applying one of the rule sets by the processor comprising search parameters to a respective one of the categorized subsets of data to parse for data matching the search parameters. 3 . The method of claim 2 , wherein the analyzing of the rules sets by the processor further comprises: determining patterns in the subsets of data by querying a pattern repository storing patterns, each pattern stored in the pattern repository defining a discernable regularity of a known element that repeats in a predictable manner, and analyzing the data in each of the subsets of data using the search parameters to determine a pattern exists when the query is satisfied; computing a value using the search parameters corresponding to the persistence of the data in the subsets of data and comparing the value to a predetermined threshold, and analyzing the computed value and predetermined threshold to determine the persistence of the data satisfies the predetermined threshold when the value exceeds the predetermined threshold; classifying data, using the search parameters, in each of the subsets of data as private or public based on secure classification criteria, and classifying the data in each of the subsets of data as private when the data satisfies the secure classification criteria; and querying the data in each of the subsets of data based on the search parameters having been manually input as conditions, and identifying the data satisfying the conditions. 4 . The method of claim 3 , wherein the ranking by the processor to analyze the subsets of data comprises: marking the data in the subsets of data for purging when any one of the rule sets is satisfied and the threshold condition is not exceeded as a result of the rule set being satisfied; and marking the data in the subsets of data for masking when any one of the rule sets is satisfied and the policy has been satisfied and the threshold condition is not exceeded as a result of the policy being satisfied. 5 . The method of claim 4 , wherein the detection of pattern changes by the processor comprises identifying pattern changes, using the processor, in the subsets of data having marked data, without analyzing the marked data, by querying the pattern repository; and determining a pattern exits when a pattern in the pattern repository matches a pattern in the subset of data without the marked data. 6 . The method of claim 5 , wherein the processor is configured to compare identified pattern changes in the analysis of subsets of data to pattern changes in the analysis of subsets of data having marked data, when results of the comparison fall within the predetermined criteria, purging the marked data from the subsets of data marked to be purged and masking the marked data from the subsets of data marked to be masked; and when the results of the comparison fall outside of the predetermined criteria, modify the threshold condition such that ranking of the subsets of data is modified for re-processing and analysis. 7 . The method of claim 6 , wherein the processor accesses the storage system to update the data stored therein to reflect the purged and masked subsets of data when the comparison results fall within the predetermined criteria. 8 . The method of claim 4 , wherein the policy is stored in a policy database and comprises a policy for sensitive data, access to the data, privacy of the data, copying of the data, or encryption of the data. 9 . An apparatus to reduce storage of data, comprising: a data source to store data for processing; and a processor configured to categorize and analyze subsets of the data accessed from the data source, for each of the analyzed subsets of data the processor is configured to detect and rank the subsets of data to be one of purged or masked data based on range, the processor is configured to mark the subsets of data based on the ranking for purging when the range is satisfied, and mark the subsets of data for masking based on the ranking of private data, the processor configured to identify pattern changes between the subsets of data prior to analysis and the marked subsets of data for purging and between the subsets of data prior to analysis and the marked subsets of data for masking, the processor configured to process the subsets of data for updating by reducing the amount of data using the processor when pattern changes satisfying a predetermined criteria have been identified and providing the updated subsets of data with the reduced amount of data to the storage system for storage. 10 . The apparatus of claim 9 , wherein: the processor is configured to use key identifiers to categorize the subsets of data, each of the categorical subsets of data analyzed based on a rule set associated with a respective category for each of the subsets of data, and determine the ranking when the range has been satisfied. 11 . The apparatus of claim 10 , wherein: the processor is configured to, during the categorization of each of the subsets of data, apply the key identifiers to identify characteristics of the content in each of the subsets of data that are associated with a specific category; and the processor is configured to, during the analysis of the categorized subsets of data, apply rule sets comprising search parameters to a respective one of the categorized subsets of data to parse for data matching the search parameters. 12 . The apparatus of claim 11 , wherein: the rules sets are stored in a rule set database such that the processor is configured to access and apply the rule sets; the processor is configured to determine patterns in the subsets of data by querying a pattern repository sto

Assignees

Inventors

Classifications

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Query processing support for facilitating data mining operations in structured databases · CPC title

  • Clustering or classification · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016232159A1 cover?
The system and method of the present disclosure relates to technology for reducing the amount of data stored in a storage system by processing subsets of data stored in data sources using advanced analytics. The process generally includes extracting data from data sources for analysis by ranking the data, marking the data, identifying pattern changes in the data, comparing pattern changes in th…
Who is the assignee on this patent?
Ca Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2465. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 11 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).