What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Classifying user behavior as anomalous

US11727311B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11727311-B2
Application number	US-202217870733-A
Country	US
Kind code	B2
Filing date	Jul 21, 2022
Priority date	Jul 27, 2015
Publication date	Aug 15, 2023
Grant date	Aug 15, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying user behavior as anomalous. One of the methods includes obtaining user behavior data representing behavior of a user in a subject system. An initial model is generated from training data, the initial model having first characteristic features of the training data. A resampling model is generated from the training data and from multiple instances of the first representation for a test time period. A difference between the initial model and the resampling model is computed. The user behavior in the test time period is classified as anomalous based on the difference between the initial model and the resampling model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a plurality of topics, each topic being data representing a plurality of file types that frequently co-occur in user behavior data of individual users; obtaining user behavior data representing behavior of a user in a subject system, wherein the user behavior data indicates file types of files accessed by the user in the subject system and when the file was accessed by the user; generating test data from the user behavior data, the test data comprising a first representation of which topics the user accessed during a test time period according to the file types of the user behavior data; generating training data from the user behavior data, the training data comprising respective second representations of which topics the user accessed in each of multiple time periods prior to the test time period; generating an initial SVD model from the test data; generating a resampling model from the training data from multiple instances of the first representation of which topics the user accessed during the test time period; computing a difference between the initial model and the resampling model; and classifying the user behavior in the test time period as anomalous based on the difference between the initial model and the resampling model. 2. The method of claim 1 , further comprising generating the plurality of topics from file types of files accessed by multiple users in the subject system. 3. The method of claim 2 , further comprising: generating the topics using a topic modeling process including defining each user to be a document and each file type accessed by each user to be a term in the corresponding document. 4. The method of claim 3 , wherein generating the topics using the topic modeling process comprises generating a predetermined number K of topics. 5. The method of claim 4 , wherein generating the K topics comprises generating a probability distribution for each of the K topics that assigns a likelihood to a particular file type being accessed by a user who accesses file types assigned to the topic. 6. The method of claim 3 , further comprising: iterating over a plurality of candidate values of K; and selecting a particular candidate value of K as the predetermined number K. 7. The method of claim 1 , wherein computing the difference between the initial model and the resampling model comprises comparing the initial model and the resampling model using singular value decomposition. 8. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining a plurality of topics, each topic being data representing a plurality of file types that frequently co-occur in user behavior data of individual users; obtaining user behavior data representing behavior of a user in a subject system, wherein the user behavior data indicates file types of files accessed by the user in the subject system and when the file was accessed by the user; generating test data from the user behavior data, the test data comprising a first representation of which topics the user accessed during a test time period according to the file types of the user behavior data; generating training data from the user behavior data, the training data comprising respective second representations of which topics the user accessed in each of multiple time periods prior to the test time period; generating an initial SVD model from the test data; generating a resampling model from the training data from multiple instances of the first representation of which topics the user accessed during the test time period; computing a difference between the initial model and the resampling model; and classifying the user behavior in the test time period as anomalous based on the difference between the initial model and the resampling model. 9. The system of claim 8 , wherein the operations further comprise the plurality of topics from file types of files accessed by multiple users in the subject system. 10. The system of claim 9 , wherein the operations further comprise: generating the topics using a topic modeling process including defining each user to be a document and each file type accessed by each user to be a term in the corresponding document. 11. The system of claim 10 , wherein generating the topics using the topic modeling process comprises generating a predetermined number K of topics. 12. The system of claim 11 , wherein generating the K topics comprises generating a probability distribution for each of the K topics that assigns a likelihood to a particular file type being accessed by a user who accesses file types assigned to the topic. 13. The system of claim 10 , wherein the operations further comprise: iterating over a plurality of candidate values of K; and selecting a particular candidate value of K as the predetermined number K. 14. The system of claim 8 , wherein computing the difference between the initial model and the resampling model comprises comparing the initial model and the resampling model using singular value decomposition. 15. One or more non-transitory computer storage media encoded with computer program instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: obtaining a plurality of topics, each topic being data representing a plurality of file types that frequently co-occur in user behavior data of individual users; obtaining user behavior data representing behavior of a user in a subject system, wherein the user behavior data indicates file types of files accessed by the user in the subject system and when the file was accessed by the user; generating test data from the user behavior data, the test data comprising a first representation of which topics the user accessed during a test time period according to the file types of the user behavior data; generating training data from the user behavior data, the training data comprising respective second representations of which topics the user accessed in each of multiple time periods prior to the test time period; generating an initial SVD model from the test data; generating a resampling model from the training data from multiple instances of the first representation of which topics the user accessed during the test time period; computing a difference between the initial model and the resampling model; and classifying the user behavior in the test time period as anomalous based on the difference between the initial model and the resampling model. 16. The non-transitory computer storage media of claim 15 , wherein the operations further comprise generating the plurality of topics from file types of files accessed by multiple users in the subject system. 17. The non-transitory computer storage media of claim 16 , wherein the operations further comprise: generating the topics using a topic modeling process including defining each user to be a document and each file type accessed by each user to be a term in the corresponding document. 18. The non-transitory computer storage media of claim 17 , wherein generating the topics using the topic modeling process comprises generating a predetermined number K of topics. 19. The non-transitory computer storage media of claim 18 , wherein generating the K topics comprises generating a probability distribution for each of the K topics that assigns

Assignees

Pivotal Software Inc

Inventors

Classifications

G06N20/00Primary
Machine learning · CPC title
G06F16/285
Clustering or classification · CPC title
G06F21/316
by observing the pattern of computer usage, e.g. typical user behaviour · CPC title
G06F21/552
involving long-term monitoring or reporting · CPC title
H04L63/1425
Traffic logging, e.g. anomaly detection · CPC title

Patent family

Related publications grouped by family.

View patent family 56609977

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11727311B2 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying user behavior as anomalous. One of the methods includes obtaining user behavior data representing behavior of a user in a subject system. An initial model is generated from training data, the initial model having first characteristic features of the training data. A resampling model i…
Who is the assignee on this patent?: Pivotal Software Inc
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).