What technology area does this patent fall under?

Primary CPC classification G06F16/2465. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method of reducing data in a storage system

US10140343B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10140343-B2
Application number	US-201514616975-A
Country	US
Kind code	B2
Filing date	Feb 9, 2015
Priority date	Feb 9, 2015
Publication date	Nov 27, 2018
Grant date	Nov 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The system and method of the present disclosure relates to technology for reducing the amount of data stored in a storage system by processing subsets of data stored in data sources using advanced analytics. The process generally includes extracting data from data sources for analysis by ranking the data, marking the data, identifying pattern changes in the data, comparing pattern changes in the data and purging and/or masking the data for storage. The system also includes databases for storing and defining rules, patterns, policies and classification data to be applied to the data from the data sources and analytics to apply the rules, patterns, policies and classification information on the data. As a result, the data stored in the data sources is reduced, and processing efficiency is increased.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of reducing data in a storage system, comprising: accessing the data stored in the storage system by a processor; parsing the data accessed from the storage system into subsets of data using the processor, the parsing comprising categorizing the subsets of data, in response to a query, into a plurality of categories including a category to determine relevancy, wherein data in the relevancy category is generated by correlating identified patterns to generate a relationship sequence and to correlate each relationship in the relationship sequence, and wherein the relationships are used to determine the relevancy of the data; for each of the categorized subsets of data, using the processor to detect the subsets of data to be purged based on a threshold condition having been satisfied, and ranking the subsets of data for which the threshold condition has been satisfied, and detect the subsets of data to be masked based on a policy having been satisfied, and ranking the subsets of data for which the policy has been satisfied; individually marking each of the subsets of data based on the ranking for purging using the processor when the threshold condition has been satisfied, and individually marking each of the subsets of data for masking based on the ranking using the processor when the policy has been satisfied; identifying pattern changes using the processor between each of the subsets of data prior to a first purging and the marked subsets of data for purging or between each of the subsets of data prior to a first masking and the marked subsets of data for masking; and processing each of the subsets of data, prior to the first purging and prior to the first masking, for permanent changes by reducing the amount of data using the processor when pattern changes satisfying a predetermined criteria have been identified, and providing the permanently changed subsets of data with the reduced amount of data to the storage system for storage. 2. The method of claim 1 , further comprising: during the categorization of each of the subsets of data, applying the key identifiers by the processor to identify characteristics of the content in each of the subsets of data that are associated with a specific category; and during the categorizing of the subsets of data, applying one of the rule sets by the processor comprising search parameters to a respective one of the categorized subsets of data to parse for data matching the search parameters. 3. The method of claim 2 , wherein the analyzing of the rules sets by the processor further comprises: determining patterns in the subsets of data by querying a pattern repository storing patterns, each pattern stored in the pattern repository defining a discernable regularity of a known element that repeats in a predictable manner, and analyzing the data in each of the subsets of data using the search parameters to determine a pattern exists when the query is satisfied; computing a value using the search parameters corresponding to the persistence of the data in the subsets of data and comparing the value to a predetermined threshold, and analyzing the computed value and predetermined threshold to determine the persistence of the data satisfies the predetermined threshold when the value exceeds the predetermined threshold; classifying data, using the search parameters, in each of the subsets of data as private or public based on secure classification criteria, and classifying the data in each of the subsets of data as private when the data satisfies the secure classification criteria; and querying the data in each of the subsets of data based on the search parameters having been manually input as conditions, and identifying the data satisfying the conditions. 4. The method of claim 3 , wherein the ranking by the processor to analyze the subsets of data comprises: marking the data in the subsets of data for purging when any one of the rule sets is satisfied and the threshold condition is not exceeded as a result of the rule set being satisfied; and marking the data in the subsets of data for masking when any one of the rule sets is satisfied and the policy has been satisfied and the threshold condition is not exceeded as a result of the policy being satisfied. 5. The method of claim 4 , wherein the detection of pattern changes by the processor comprises identifying pattern changes, using the processor, in the subsets of data having marked data, without analyzing the marked data, by querying the pattern repository; and determining a pattern exists when a pattern in the pattern repository matches a pattern in the subset of data without the marked data. 6. The method of claim 5 , wherein the processor is configured to compare identified pattern changes in the analysis of subsets of data to pattern changes in the analysis of subsets of data having marked data, when results of the comparison fall within the predetermined criteria, purging the marked data from the subsets of data marked to be purged and masking the marked data from the subsets of data marked to be masked; and when the results of the comparison fall outside of the predetermined criteria, modify the threshold condition such that ranking of the subsets of data is modified for re-processing and analysis. 7. The method of claim 6 , wherein the processor accesses the storage system to update the data stored therein to reflect the purged and masked subsets of data when the comparison results fall within the predetermined criteria. 8. The method of claim 4 , wherein the policy is stored in a policy database and comprises a policy for sensitive data, access to the data, privacy of the data, copying of the data, or encryption of the data. 9. An apparatus to reduce storage of data, comprising: a data source to store data for processing; and a processor configured to parse data accessed from the data source into subsets of data, the parsing comprising categorizing the subsets of data, in response to a query, into a plurality of categories including a category to determine relevancy, wherein data in the relevancy category is generated by correlating identified patterns to generate a relationship sequence and to correlate each relationship in the relationship sequence, and wherein the relationships are used to determine the relevancy of the data, for each of the categorized subsets of data the processor is configured to detect and rank the subsets of data to be one of purged or masked data based on range, the processor is configured to mark each of the subsets of data based on the ranking for purging when the range is satisfied, and mark each of the subsets of data for masking based on the ranking of private data, the processor configured to identify pattern changes between each of the subsets of data prior to first purging and the marked subsets of data for purging and between each of the subsets of data prior to first masking and the marked subsets of data for masking, the processor configured to process each of the subsets of data, prior to the first purging and prior to the first masking, for updating by reducing the amount of data using the processor when pattern changes satisfying a predetermined criteria have been identified, providing the updated subsets of data with the reduced amount of data to the storage system for storage, the processor is configured to identify when a pattern in a pattern repository matches a pattern in the marked subsets of data. 10. The apparatus of claim 9 , wherein: the processor is configured to use key identifiers to categorize the subsets of data, each of the categorical subsets of data analyzed based on a rule set associated with a respective category for each

Assignees

Ca Inc

Inventors

Parikh Prashant

Classifications

G06F21/6245
Protecting personal data, e.g. for financial or medical purposes · CPC title
G06F16/2465Primary
Query processing support for facilitating data mining operations in structured databases · CPC title
G06F21/6254
by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title
G06F16/285
Clustering or classification · CPC title
G06F17/30598
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 56565954

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10140343B2 cover?: The system and method of the present disclosure relates to technology for reducing the amount of data stored in a storage system by processing subsets of data stored in data sources using advanced analytics. The process generally includes extracting data from data sources for analysis by ranking the data, marking the data, identifying pattern changes in the data, comparing pattern changes in th…
Who is the assignee on this patent?: Ca Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/2465. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).