What technology area does this patent fall under?

Primary CPC classification G06F21/6218. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 02 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Deriving encryption rules based on file content

US9405928B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9405928-B2
Application number	US-201414489222-A
Country	US
Kind code	B2
Filing date	Sep 17, 2014
Priority date	Sep 17, 2014
Publication date	Aug 2, 2016
Grant date	Aug 2, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data storage systems are disclosed for automatically generating encryption rules based on a set of training files that are known to include sensitive information. The system may use a number of heuristic algorithms to generate one or more encryption rules for determining whether a file includes sensitive information. Further, the system may apply the heuristic algorithms to the content of the files, as determined by using natural language processing algorithms, to generate the encryption rules. Moreover, systems are disclosed that are capable of automatically determining whether to encrypt a file based on the generated encryption rules. The content of the file may be determined using natural language processing algorithms and then the encryption rules may be applied to the content of the file to determine whether to encrypt the file.

First claim

Opening claim text (preview).

What is claimed is: 1. A data storage system comprising: a content analyzer comprising computer hardware, the content analyzer configured to: access a set of training files that include content designated as sensitive information; and use one or more processing algorithms with respect to the set of training files to obtain a set of data tokens for each training file, each of the data tokens from the set of data tokens comprising a portion of a training file from the set of training files, the portion of the training file comprising content included in the training file, at least some of the training files including at least some of the sensitive information; an encryption rules generator comprising computer hardware, the encryption rules generator configured to: use one or more algorithms to generate a set of encryption rules based on the set of data tokens obtained for each training file, wherein at least some of the set of encryption rules are configured to identify a file to encrypt based at least in part on a correspondence between portions of the file and at least some of the set of data tokens; generate a prospective encryption rule based on an aggregated set of data tokens, the aggregated set of data tokens based on the set of data tokens for each training file; perform the prospective encryption rule using the set of training files: determine a number of training files from the set of training files identified for encryption based on the prospective encryption rule; and responsive, at least in part, to the number of training files identified for encryption satisfying a threshold, adding the prospective encryption rule to the set of encryption rules; and an encryption processor comprising computer hardware, the encryption processor configured to encrypt the file based at least in part on one of the encryption rules from the set of encryption rules. 2. The data storage system of claim 1 , further comprising an encryption rules repository configured to store the set of encryption rules, wherein the encryption rules repository is accessible by one or more computing systems. 3. The data storage system of claim 1 , wherein the encryption rules generator is further configured to: determine a context condition for an encryption rule of the set of encryption rules, the context condition identifying when to apply the encryption rule to the file; and associate the context condition with the encryption rule. 4. The data storage system of claim 3 , wherein the context condition comprises at least one of an identity of a user, an identity of a department that includes the user within an entity, a geographic location of a computing device storing the file, a network location of a computing device storing the file, and a device type of the computing device. 5. The data storage system of claim 1 , wherein the encryption rules generator is configured to determine an encryption rule based on the set of data tokens obtained for a plurality of training files. 6. The data storage system of claim 1 , wherein the encryption rules generator is further configured to: present the prospective encryption rule to a user; receive an input from the user responsive to presenting the prospective encryption rule to the user; and determine whether to include the prospective encryption rule in the set of encryption rules based at least in part on the input received from the user. 7. The data storage system of claim 1 , wherein the content analyzer is further configured to remove a data token from a set of data tokens of a training file based on an identified set of non-sensitive data tokens. 8. The data storage system of claim 1 , further comprising: a file monitor configured to monitor creation of the file; and an encryption rules engine configured to determine whether the file satisfies an encryption rule from the set of encryption rules. 9. A method of automatically generating encryption rules using machine learning techniques, the method comprising: accessing, by a rules generation system comprising computer hardware, a set of one or more training files that include content designated as sensitive information; applying, by the rules generation system, one or more processing algorithms to each training file included in the set of training files to obtain a set of data tokens for each training file, wherein each of the set of data tokens for a training file corresponds to a portion of the training file, the portion of the training file comprising content included in the training file, at least some of the training files including at least some of the sensitive information, wherein applying the one or more processing algorithms to the set of data tokens comprises: generating a prospective encryption rule based on the set of data tokens; performing the prospective encryption rule with respect to the set of training files; determining a percentage of training files from the set of training files identified for encryption using the prospective encryption rule; and responsive to the percentage of training files identified for encryption satisfying a threshold, adding the prospective encryption rule to the set of encryption rules; applying, by the rules generation system, one or more algorithms to the set of data tokens for each training file to generate a set of encryption rules for identifying files with sensitive information, wherein at least some of the set of encryption rules are configured to identify a file to encrypt based at least in part on a correspondence between portions of the file and at least some of the set of data tokens; and storing the set of encryption rules in an encryption rules repository accessible for one or more systems for determining whether to encrypt the file. 10. The method of claim 9 , wherein the one or more processing algorithms comprise natural language processing algorithms. 11. The method of claim 9 , wherein the one or more algorithms comprise heuristic algorithms. 12. The method of claim 9 , wherein at least one of the one or more processing algorithms comprises a natural language processing algorithm and wherein applying the one or more processing algorithms comprises performing at least one of the following natural language processing tasks: automatic summarization, coreference resolution, discourse analysis, machine translation, morphological segmentation, named entity recognition, natural language understanding, optical character recognition, part-of-speech tagging, parsing, relationship extraction, sentence boundary disambiguation, sentiment analysis, topic segmentation and recognition, word segmentation, word sense disambiguation, singular value decomposition, latent semantic analysis, latent Dirichlet allocation, pachinko allocation, and probabilistic latent semantic analysis. 13. The method of claim 9 , wherein applying the one or more algorithms to the set of data tokens for each training file comprises applying the one or more algorithms on a file-by-file basis, separately to each set of data tokens. 14. The method of claim 9 , wherein applying the one or more algorithms to the set of data tokens for each training file comprises applying the one or more algorithms to a cumulative set of data tokens formed by combining the sets of data tokens from a plurality of training files. 15. The method of claim 9 , further comprising presenting the set of encryption rules to a user for confirmation, wherein storing the set of encryption rules comprises storing encryption rules from the set of encryption rules confirmed by the user. 16. The method of claim 9 , furt

Assignees

Commvault Systems Inc

Inventors

Classifications

G06N20/00
Machine learning · CPC title
H04L67/10
in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title
G06F21/602
Providing cryptographic facilities or services · CPC title
G06F21/6218Primary
to a system of files or objects, e.g. local or distributed file system or database · CPC title
G06F2221/2107
File encryption · CPC title

Patent family

Related publications grouped by family.

View patent family 55455021

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9405928B2 cover?: Data storage systems are disclosed for automatically generating encryption rules based on a set of training files that are known to include sensitive information. The system may use a number of heuristic algorithms to generate one or more encryption rules for determining whether a file includes sensitive information. Further, the system may apply the heuristic algorithms to the content of the f…
Who is the assignee on this patent?: Commvault Systems Inc
What technology area does this patent fall under?: Primary CPC classification G06F21/6218. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 02 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).