Malware detection

US10635814B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10635814-B2
Application numberUS-201816183624-A
CountryUS
Kind codeB2
Filing dateNov 7, 2018
Priority dateJul 15, 2015
Publication dateApr 28, 2020
Grant dateApr 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one respect, there is provided a system for training a neural network adapted for classifying one or more scripts. The system may include at least one processor and at least one memory. The memory may include program code which when executed by the at least one memory provides operations including: receiving a disassembled binary file that includes a plurality of instructions; processing the disassembled binary file with a convolutional neural network configured to detect a presence of one or more sequences of instructions amongst the plurality of instructions and determine a classification for the disassembled binary file based at least in part on the presence of the one or more sequences of instructions; and providing, as an output, the classification of the disassembled binary file. Related computer-implemented methods are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: at least one processor; and at least one memory including program code which when executed by the at least one memory provides operations comprising: receiving a disassembled binary file that includes a plurality of instructions, at least a portion of the instructions being variable in length; generating fixed length representations of the plurality of instructions by at least one of truncating or padding each of the plurality of instructions to a same length; encoding the generated fixed length representations for more efficient processing by a convolutional neural network; processing the disassembled binary file with a trained convolutional neural network configured to detect a presence of one or more sequences of instructions amongst the plurality of instructions and determine a classification for the disassembled binary file based at least in part on the presence of the one or more sequences of instructions; and providing, as an output, the classification of the disassembled binary file to determine whether to execute, open, or access a binary file corresponding to the disassembled binary file; wherein the convolutional neural network is configured to: apply a first plurality of kernels to the disassembled binary file, and wherein each of the first plurality of kernels is adapted to detect a different sequence of two or more instructions; and subsequently apply a second plurality of kernels to the disassembled binary file, and wherein each of the second plurality of kernels is adapted to detect a different sequence of two or more sequences of instructions. 2. The system of claim 1 , wherein the fixed length representations of the plurality of instructions includes a mnemonic associated with each instruction. 3. The system of claim 1 , wherein the encoding is based on one-hot encoding or binary encoding. 4. The system of claim 1 , wherein applying the first plurality of kernels includes applying a first weight matrix to a matrix representation of the disassembled binary file, and wherein the matrix representation of the disassembled binary file comprises encoded fixed length representations of the plurality of instructions included in the disassembled binary file. 5. The system of claim 4 , wherein the system is further configured to train the convolutional neural network by at least: receiving a plurality of training files, wherein the plurality of training files comprises a plurality of disassembled binary files; determining a classification of a first training file by at least processing the first training file with the convolutional neural network; back propagating an error associated with the classification of the first training file; and adjusting at least the first weight matrix to minimize the error associated with the classification of the first training file. 6. The system of claim 5 , wherein training the convolutional neural network further comprises: determining a classification for a second training file by at least processing the second training file with the convolutional neural network; back propagating an error associated with the classification of the second training file; and readjusting at least the first weight matrix to minimize the error associated with the classification of the second training file. 7. A computer-implemented method, comprising: receiving a disassembled binary file that includes a plurality of instructions, at least a portion of the instructions being variable in length; generating fixed length representations of the plurality of instructions by at least one of truncating or padding each of the plurality of instructions to a same length; encoding the generated fixed length representations for more efficient processing by a convolutional neural network; processing the disassembled binary file with a trained convolutional neural network configured to detect a presence of one or more sequences of instructions amongst the plurality of instructions and determine a classification for the disassembled binary file based at least in part on the presence of the one or more sequences of instructions; and providing, as an output, the classification of the disassembled binary file to determine whether to execute, open, or access a binary file corresponding to the disassembled binary file; wherein the convolutional neural network is configured to: apply a first plurality of kernels to the disassembled binary file, and wherein each of the first plurality of kernels is adapted to detect a different sequence of two or more instructions; and subsequently apply a second plurality of kernels to the disassembled binary file, and wherein each of the second plurality of kernels is adapted to detect a different sequence of two or more sequences of instructions. 8. The method of claim 7 , wherein the fixed length representations of the plurality of instructions includes a mnemonic associated with each instruction. 9. The method of claim 7 , wherein the encoding is based on one-hot encoding or binary encoding. 10. The method of claim 7 , wherein applying the first plurality of kernels includes applying a first weight matrix to a matrix representation of the disassembled binary file, and wherein the matrix representation of the disassembled binary file comprises encoded fixed length representations of the plurality of instructions included in the disassembled binary file. 11. The method of claim 10 , further comprising training the convolutional neural network by at least: receiving a plurality of training files, wherein the plurality of training files comprises a plurality of disassembled binary files; determining a classification of a first training file by at least processing the first training file with the convolutional neural network; back propagating an error associated with the classification of the first training file; adjusting at least the first weight matrix to minimize the error associated with the classification of the first training file. 12. The method of claim 11 , wherein training the convolutional neural network further comprises: determining a classification for a second training file by at least processing the second training file with the convolutional neural network; back propagating an error associated with the classification of the second training file; and readjusting at least the first weight matrix to minimize the error associated with the classification of the second training file. 13. A computer-implemented method, comprising: receiving a disassembled binary file that includes a plurality of instructions, at least a portion of the instructions being variable in length; generating fixed length representations of the plurality of instructions by at least one of truncating or padding each of the plurality of instructions to a same length; encoding the generated fixed length representations for more efficient processing by a convolutional neural network; processing the disassembled binary file with a trained convolutional neural network configured to apply two different pluralities of kernels in sequence to detect a presence of one or more sequences of instructions amongst the plurality of instructions and determine a classification for the disassembled binary file based at least in part on the presence of the one or more sequences of instructions, each kernel being configured to detect a specific, different sequence of instructions; and providing, as an output, the classification of the disassembled binary file to determine whether to execute, open, or access a binary file corresponding to the disassembled binary

Assignees

Inventors

Classifications

  • G06F21/562Primary

    Static detection · CPC title

  • Test or assess a computer or a system · CPC title

  • G06F21/565Primary

    by checking file integrity · CPC title

  • Learning methods · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10635814B2 cover?
In one respect, there is provided a system for training a neural network adapted for classifying one or more scripts. The system may include at least one processor and at least one memory. The memory may include program code which when executed by the at least one memory provides operations including: receiving a disassembled binary file that includes a plurality of instructions; processing the…
Who is the assignee on this patent?
Cylance Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/562. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).