What technology area does this patent fall under?

Primary CPC classification G06F21/6218. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data leak prevention enforcement based on learned document classification

US9626528B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9626528-B2
Application number	US-201414201107-A
Country	US
Kind code	B2
Filing date	Mar 7, 2014
Priority date	Mar 7, 2014
Publication date	Apr 18, 2017
Grant date	Apr 18, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates generally to the field of automatically learning and automatically adapting to perform classification of protected data. In various examples, learning and adapting to perform classification of protected data may be implemented in the form of systems, methods and/or algorithms.

First claim

Opening claim text (preview).

What is claimed is: 1. An automated method for data leak prevention, the method comprising: obtaining, by a processor, a plurality of training documents and corresponding meta data associated with each training document from a document management system associated with a party, each of the training documents comprising at least one respective content, the corresponding metadata associated with each training document comprising a security classification set by the party in the document management system, the security classification classifying the training document associated with the corresponding metadata into one of at least two security categories; in response to obtaining the plurality of training documents from the document management system, converting each training document into a feature set comprising at least one pairing of a feature of the respective content of the respective training document with the security classification of the respective training document found in the corresponding metadata associated with the respective training document; generating, by the processor, a classification model based at least in part upon the pairings found in the feature sets of each of the training documents, wherein the generated classification model comprises at least one correlation between the features found in the respective content of each training document and the security classification found in the corresponding metadata associated with each training document; obtaining, by the processor, at least one non-training document, wherein the at least one non-training document comprises at least one respective content; in response to obtaining the at least one non-training document, applying, by the processor, the generated classification model to the at least one non-training document, the application of the classification model to the at least one non-training document comprising: correlating the at least one respective content of the at least one non-training document to a security classification of the at least one non-training document based on the at least one correlation in the generated classification model; and classifying the at least one non-training document into one of the at least two security categories based on the correlation of the at least one respective content of the at least one non-training document to the security classification; monitoring the at least one non-training document, by the processor, for attempted access to the at least one non-training document; detecting, by the processor, based on the monitoring, an attempted access to the at least one non-training document; in response to detecting an attempted access to the at least one non-training document, taking, by the processor, a predetermined action; wherein the predetermined action that is taken is based upon the one of the at least two categories into which the at least one non-training document has been classified by the application of the generated classification model; and wherein the predetermined action that is taken comprises one of: (a) denying access to the at least one non-training document to which access is attempted; (b) logging the attempted access to the at least one non-training document to which access is attempted; and (c) a combination thereof. 2. The method of claim 1 , wherein the non-training document is obtained from a document management system. 3. The method of claim 1 , further comprising generating, by the processor, an enforcement policy, wherein the enforcement policy specifies the at least one action to be taken when the attempt is made to access a document having a predetermined category. 4. The method of claim 1 , wherein the action that is taken comprises permitting access to the non-training document to which access is attempted. 5. The method of claim 4 , wherein the action that is taken is permitting and logging access to the non-training document to which access is attempted. 6. The method of claim 3 , wherein the enforcement policy is enforced on at least one of: (a) an email component; (b) an end user device component; (c) a web component; (d) a network component; and (e) a combination thereof. 7. The method of claim 6 , wherein: the end user device component comprises at least one of: (a) a desktop computer; (b) a laptop computer; (c) a tablet; (d) a smartphone; and (e) a combination thereof. 8. A computer readable storage medium, tangibly embodying a program of instructions executable by the computer for automated data leak prevention, the program of instructions, when executing, performing the following steps: obtaining a plurality of training documents and corresponding meta data associated with each training document from a document management system associated with a party, each of the training documents comprising at least one respective content, the corresponding metadata associated with each training document comprising a security classification set by the party in the document management system, the security classification classifying the training document associated with the corresponding metadata into one of at least two security categories; in response to obtaining the plurality of training documents from the document management system, converting each training document into a feature set comprising at least one pairing of a feature of the respective content of the respective training document with the security classification of the respective training document found in the corresponding metadata associated with the respective training document; generating a classification model based at least in part upon the pairings found in the feature sets of each of the training documents, wherein the generated classification model comprises at least one correlation between the features found in the respective content of each training document and the security classification found in the corresponding metadata associated with each training document; obtaining at least one non-training document, wherein the at least one non-training document comprises at least one respective content; in response to obtaining the at least one non-training document, applying the generated classification model to the at least one non-training document the application of the classification model to the at least one non-training document comprising: correlating the at least one respective content of the at least one non-training document to a security classification of the at least one non-training document based on the at least one correlation in the generated classification model; classifying the at least one non-training document into one of the at least two categories based on the correlation of the at least one respective content of the at least one non-training document to the security classification; monitoring the at least one non-training document for attempted access to the at least one non-training document; detecting, based on the monitoring, an attempted access to the at least one non-training document; in response to detecting an attempted access to the at least one non-training document, taking a predetermined action; wherein the predetermined action that is taken is based upon the one of the at least two categories into which the at least one non-training document has been classified by the application of the generated classification model; and wherein the predetermined action that is taken comprises one of: (a) denying access to the at least one non-training document to which access is attempted; (b) logging the attempted access to the at least one non-training document to which access is attempted; and (c) a combination thereof. 9. The computer readable storage medium of claim 8 , wherein the pro

Assignees

Inventors

Butler Anthony M

Classifications

G06N5/025
Extracting rules from data · CPC title
G06N99/005
Physics · mapped topic
G06F21/6218Primary
to a system of files or objects, e.g. local or distributed file system or database · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 54017640

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9626528B2 cover?: The present disclosure relates generally to the field of automatically learning and automatically adapting to perform classification of protected data. In various examples, learning and adapting to perform classification of protected data may be implemented in the form of systems, methods and/or algorithms.
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F21/6218. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for transparent data loss prevention classifications

Distributed monitoring, evaluation, and response for multiple devices

Systems and methods for end-user initiated data-loss-prevention content analysis

Frequently asked questions