Classifying user behavior as anomalous

US11727311B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11727311-B2
Application numberUS-202217870733-A
CountryUS
Kind codeB2
Filing dateJul 21, 2022
Priority dateJul 27, 2015
Publication dateAug 15, 2023
Grant dateAug 15, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying user behavior as anomalous. One of the methods includes obtaining user behavior data representing behavior of a user in a subject system. An initial model is generated from training data, the initial model having first characteristic features of the training data. A resampling model is generated from the training data and from multiple instances of the first representation for a test time period. A difference between the initial model and the resampling model is computed. The user behavior in the test time period is classified as anomalous based on the difference between the initial model and the resampling model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a plurality of topics, each topic being data representing a plurality of file types that frequently co-occur in user behavior data of individual users; obtaining user behavior data representing behavior of a user in a subject system, wherein the user behavior data indicates file types of files accessed by the user in the subject system and when the file was accessed by the user; generating test data from the user behavior data, the test data comprising a first representation of which topics the user accessed during a test time period according to the file types of the user behavior data; generating training data from the user behavior data, the training data comprising respective second representations of which topics the user accessed in each of multiple time periods prior to the test time period; generating an initial SVD model from the test data; generating a resampling model from the training data from multiple instances of the first representation of which topics the user accessed during the test time period; computing a difference between the initial model and the resampling model; and classifying the user behavior in the test time period as anomalous based on the difference between the initial model and the resampling model. 2. The method of claim 1 , further comprising generating the plurality of topics from file types of files accessed by multiple users in the subject system. 3. The method of claim 2 , further comprising: generating the topics using a topic modeling process including defining each user to be a document and each file type accessed by each user to be a term in the corresponding document. 4. The method of claim 3 , wherein generating the topics using the topic modeling process comprises generating a predetermined number K of topics. 5. The method of claim 4 , wherein generating the K topics comprises generating a probability distribution for each of the K topics that assigns a likelihood to a particular file type being accessed by a user who accesses file types assigned to the topic. 6. The method of claim 3 , further comprising: iterating over a plurality of candidate values of K; and selecting a particular candidate value of K as the predetermined number K. 7. The method of claim 1 , wherein computing the difference between the initial model and the resampling model comprises comparing the initial model and the resampling model using singular value decomposition. 8. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining a plurality of topics, each topic being data representing a plurality of file types that frequently co-occur in user behavior data of individual users; obtaining user behavior data representing behavior of a user in a subject system, wherein the user behavior data indicates file types of files accessed by the user in the subject system and when the file was accessed by the user; generating test data from the user behavior data, the test data comprising a first representation of which topics the user accessed during a test time period according to the file types of the user behavior data; generating training data from the user behavior data, the training data comprising respective second representations of which topics the user accessed in each of multiple time periods prior to the test time period; generating an initial SVD model from the test data; generating a resampling model from the training data from multiple instances of the first representation of which topics the user accessed during the test time period; computing a difference between the initial model and the resampling model; and classifying the user behavior in the test time period as anomalous based on the difference between the initial model and the resampling model. 9. The system of claim 8 , wherein the operations further comprise the plurality of topics from file types of files accessed by multiple users in the subject system. 10. The system of claim 9 , wherein the operations further comprise: generating the topics using a topic modeling process including defining each user to be a document and each file type accessed by each user to be a term in the corresponding document. 11. The system of claim 10 , wherein generating the topics using the topic modeling process comprises generating a predetermined number K of topics. 12. The system of claim 11 , wherein generating the K topics comprises generating a probability distribution for each of the K topics that assigns a likelihood to a particular file type being accessed by a user who accesses file types assigned to the topic. 13. The system of claim 10 , wherein the operations further comprise: iterating over a plurality of candidate values of K; and selecting a particular candidate value of K as the predetermined number K. 14. The system of claim 8 , wherein computing the difference between the initial model and the resampling model comprises comparing the initial model and the resampling model using singular value decomposition. 15. One or more non-transitory computer storage media encoded with computer program instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: obtaining a plurality of topics, each topic being data representing a plurality of file types that frequently co-occur in user behavior data of individual users; obtaining user behavior data representing behavior of a user in a subject system, wherein the user behavior data indicates file types of files accessed by the user in the subject system and when the file was accessed by the user; generating test data from the user behavior data, the test data comprising a first representation of which topics the user accessed during a test time period according to the file types of the user behavior data; generating training data from the user behavior data, the training data comprising respective second representations of which topics the user accessed in each of multiple time periods prior to the test time period; generating an initial SVD model from the test data; generating a resampling model from the training data from multiple instances of the first representation of which topics the user accessed during the test time period; computing a difference between the initial model and the resampling model; and classifying the user behavior in the test time period as anomalous based on the difference between the initial model and the resampling model. 16. The non-transitory computer storage media of claim 15 , wherein the operations further comprise generating the plurality of topics from file types of files accessed by multiple users in the subject system. 17. The non-transitory computer storage media of claim 16 , wherein the operations further comprise: generating the topics using a topic modeling process including defining each user to be a document and each file type accessed by each user to be a term in the corresponding document. 18. The non-transitory computer storage media of claim 17 , wherein generating the topics using the topic modeling process comprises generating a predetermined number K of topics. 19. The non-transitory computer storage media of claim 18 , wherein generating the K topics comprises generating a probability distribution for each of the K topics that assigns

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • Clustering or classification · CPC title

  • by observing the pattern of computer usage, e.g. typical user behaviour · CPC title

  • involving long-term monitoring or reporting · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11727311B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying user behavior as anomalous. One of the methods includes obtaining user behavior data representing behavior of a user in a subject system. An initial model is generated from training data, the initial model having first characteristic features of the training data. A resampling model i…
Who is the assignee on this patent?
Pivotal Software Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).