Machine learning classification using Markov modeling

US10652252B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10652252-B2
Application numberUS-201715716284-A
CountryUS
Kind codeB2
Filing dateSep 26, 2017
Priority dateSep 30, 2016
Publication dateMay 12, 2020
Grant dateMay 12, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and articles of manufacture, including computer program products, are provided for classification systems and methods using modeling. In some example embodiments, there is provided a system that includes at least one processor and at least one memory including program code which when executed by the at least one memory provides operations. The operations can include generating a representation of a sequence of sections of a file and/or determining, from a model including conditional probabilities, a probability for each transition between at least two sequential sections in the representation. The operations can further include classifying the file based on the probabilities for each transition.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for implementation by one or more processors forming part of at least one computing device, the method comprising: generating a representation of a sequence of sections of a file; determining, from a model including conditional probabilities, a probability for each transition between at least two sequential sections in the representation, the model being trained using a recursive neural network; and classifying the file based on the probabilities for each transition; wherein: the representation comprises a string of alphanumeric characters in which each character in the string corresponds to a different section of the file within the sequence; the sequence of the sections of the file are arranged according to a virtual ordering, the virtual ordering representing an order in which the sections of the file will occur in memory, the generated sequence of the sections of the file being different than an order in which at least a portion of the sections occur within the file when not executing. 2. The method of claim 1 further comprising: preventing execution of the file, when the file is classified as a malicious file. 3. The method of claim 2 , wherein the malicious file comprises at least one of an adware file, a parasitic file, and a bad file. 4. The method of claim 1 , wherein the classifying comprises classifying the file as one or more of an adware file, a parasitic file, a bad file, a packed file, and a good file. 5. The method of claim 1 , wherein the conditional probabilities comprise measured probabilities that a first file section will be followed by a second file section. 6. The method of claim 1 , wherein the conditional probabilities are generated based on training files. 7. The method of claim 1 further comprising: determining, from a second model including probabilities, a prior probability for a first section of the file occurring first, wherein classifying the file is further based on the prior probability. 8. The method of claim 1 , wherein the sequence of characters in the string are generated such that each of the characters occurs in the same order as an order of the sections of the file. 9. The method of claim 1 , wherein the conditional probabilities are included in a matrix or dictionary stored in memory, and wherein determining the probabilities for each transition comprises retrieving, for each of the transitions, a corresponding conditional probability from the matrix or dictionary. 10. The method of claim 1 , wherein the conditional probabilities are generated based on Markov modeling. 11. The method of claim 1 , wherein the sections of the file comprise one or more of a MAC header, a DOS header, rich data, a portable executable header, code, data, import data, export data, an entry point, a beginning indication, and an end indication. 12. The method of claim 1 further comprising: generating a plurality of representations of a plurality of files with a known classification; processing transitions between sections in each of the plurality of files to generate a matrix or dictionary of the conditional probabilities; comparing the plurality of files against the matrix or dictionary to generate a score range for the known classification; and generating a score for the file based on the probabilities for each transition, wherein classifying the file comprises classifying the file as belonging to the known classification when the score falls within the score range. 13. The method of claim 1 further comprising: generating a classification score based on a function of a product of the probabilities for each transition, wherein classifying the file is based on comparing the classification score against a score for one or more file classification types. 14. The method of claim 1 further comprising: comparing each transition against conditional probabilities for a plurality of different classifications to generate a plurality of classification scores; and classifying the file as belonging to one or more of the plurality of different classifications based on the plurality of classification scores. 15. The method of claim 1 further comprising: determining, for each transition between more than two sequential portions, a probability of the transition between more than two sequential portions occurring in training files. 16. The method of claim 1 , wherein the representation comprises tokens, and wherein the tokens comprise one or more of a letter, a number, a symbol, and a programmatic class. 17. The method of claim 1 , wherein there are: a plurality of models that each include conditional probabilities in a separate and distinct matrix or dictionary and which each model correspond to a different type of file; the file is scored by each of the models; and the file is classified as having a type corresponding to the model scoring the file with a highest or lowest value. 18. The method of claim 1 , wherein the sections of the file are selected from a group consisting of: a first header, a rich data section, a second header, a beginning indication, a code section, and an end indication. 19. The method of claim 1 , wherein the recursive neural network is a land change modeler (LCM). 20. A method for implementation by one or more processors forming part of at least one computing device, the method comprising: generating a representation of a sequence of sections of a file; determining, from a model including conditional probabilities, a probability for each transition between at least two sequential sections in the representation, the model being trained using a land change modeler (LCM) recursive neural network; and classifying the file based on the probabilities for each transition; wherein: the representation comprise a string of alphanumeric characters in which each character in the string corresponds to a different section of the file within the sequence. 21. The method of claim 20 further comprising: preventing execution of the file, when the file is classified as a malicious file. 22. The method of claim 21 , wherein the malicious file comprises at least one of an adware file, a parasitic file, and a bad file. 23. The method of claim 20 , wherein the classifying comprises classifying the file as one or more of an adware file, a parasitic file, a bad file, a packed file, and a good file. 24. The method of claim 20 , wherein the conditional probabilities comprise measured probabilities that a first file section will be followed by a second file section. 25. The method of claim 20 , wherein the conditional probabilities are generated based on training files. 26. The method of claim 20 further comprising: determining, from a second model including probabilities, a prior probability for a first section of the file occurring first, wherein classifying the file is further based on the prior probability. 27. The method of claim 20 , wherein the sequence of characters in the string are generated such that each of the characters occurs in the same order as an order of the sections of the file. 28. The method of claim 20 , wherein the conditional probabilities are included in a matrix or dictionary stored in memory, and wherein determining the probabilities for each transition comprises retrieving, for each of the transitions, a corresponding conditional probabilit

Assignees

Inventors

Classifications

  • Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title

  • by checking file integrity · CPC title

  • Event detection, e.g. attack signature detection · CPC title

  • the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10652252B2 cover?
Systems, methods, and articles of manufacture, including computer program products, are provided for classification systems and methods using modeling. In some example embodiments, there is provided a system that includes at least one processor and at least one memory including program code which when executed by the at least one memory provides operations. The operations can include generating…
Who is the assignee on this patent?
Cylance Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue May 12 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).