Machine learning-based malware detection system and method

US11822657B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11822657-B2
Application numberUS-202217724744-A
CountryUS
Kind codeB2
Filing dateApr 20, 2022
Priority dateApr 7, 2017
Publication dateNov 21, 2023
Grant dateNov 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a computer implemented method for malware detection that analyses a file on a per packet basis. The method receives a packet of one or more packets associated a file, and converting a binary content associated with the packet into a digital representation and tokenizing plain text content associated with the packet. The method extracts one or more n-gram features, an entropy feature, and a domain feature from the converted content of the packet and applies a trained machine learning model to the one or more features extracted from the packet. The output of the machine learning method is a probability of maliciousness associated with the received packet. If the probability of maliciousness is above a threshold value, the method determines that the file associated with the received packet is malicious.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product embodied in a non-transitory computer readable storage medium and comprising computer instructions that when executed by a processor cause the processor to perform steps of: receiving a first packet of one or more packets associated with a file; determining a file type of the file from the first packet; converting contents of the first packet into a corresponding digital representation for feature extraction; extracting one or more features from the corresponding digital representation; and applying a trained machine learning model to the one or more features to determine probability of maliciousness. 2. The computer program product embodied in a non-transitory computer readable storage medium of claim 1 , wherein the steps further include labeling the file based on the probability of maliciousness. 3. The computer program product embodied in a non-transitory computer readable storage medium of claim 1 , wherein the steps further include responsive to the first packet having the probability of maliciousness as benign, receiving a next packet of the one or more packets; and performing the converting, extracting, and applying on the next packet. 4. The computer program product embodied in a non-transitory computer readable storage medium of claim 3 , wherein the steps further include continuing with additional packets of the one or more packets until a determination of whether the file is malicious or benign. 5. The computer program product embodied in a non-transitory computer readable storage medium of claim 1 , wherein the file type is one of a portable executable (PE) file, a portable document format (PDF) file, a Dynamic Loaded Library (DLL), a JavaScript (JS) file, a Hypertext Markup Language (HTML) file, and a Microsoft Office File. 6. The computer program product embodied in a non-transitory computer readable storage medium of claim 1 , wherein the digital representation is any of a decimal representation, a binary representation, a hexadecimal representation, a tokenized script, and a tokenized domain. 7. The computer program product embodied in a non-transitory computer readable storage medium of claim 1 , wherein the trained machine learning model comprises one or more decision trees. 8. The computer program product embodied in a non-transitory computer readable storage medium of claim 1 , wherein the one or more features include n-gram features. 9. The computer program product embodied in a non-transitory computer readable storage medium of claim 1 , wherein the one or more features include an entropy feature. 10. The computer program product embodied in a non-transitory computer readable storage medium of claim 1 , wherein the one or more features include a domain feature. 11. A method comprising steps of: receiving a first packet of one or more packets associated with a file; determining a file type of the file from the first packet; converting contents of the first packet into a corresponding digital representation for feature extraction; extracting one or more features from the corresponding digital representation; and applying a trained machine learning model to the one or more features to determine probability of maliciousness. 12. The method of claim 11 , wherein the steps further include labeling the file based on the probability of maliciousness. 13. The method of claim 11 , wherein the steps further include responsive to the first packet having the probability of maliciousness as benign, receiving a next packet of the one or more packets; and performing the converting, extracting, and applying on the next packet. 14. The method of claim 13 , wherein the steps further include continuing with additional packets of the one or more packets until a determination of whether the file is malicious or benign. 15. The method of claim 11 , wherein the file type is one of a portable executable (PE) file, a portable document format (PDF) file, a Dynamic Loaded Library (DLL), a JavaScript (JS) file, a Hypertext Markup Language (HTML) file, and a Microsoft Office File. 16. The method of claim 11 , wherein the digital representation is any of a decimal representation, a binary representation, a hexadecimal representation, a tokenized script, and a tokenized domain. 17. The method of claim 11 , wherein the trained machine learning model comprises one or more decision trees. 18. The method of claim 11 , wherein the one or more features include n-gram features. 19. The method of claim 11 , wherein the one or more features include an entropy feature. 20. The method of claim 11 , wherein the one or more features include a domain feature.

Assignees

Inventors

Classifications

  • G06F21/566Primary

    Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Machine learning · CPC title

  • Ensemble learning · CPC title

  • Test or assess a computer or a system · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11822657B2 cover?
Disclosed is a computer implemented method for malware detection that analyses a file on a per packet basis. The method receives a packet of one or more packets associated a file, and converting a binary content associated with the packet into a digital representation and tokenizing plain text content associated with the packet. The method extracts one or more n-gram features, an entropy featur…
Who is the assignee on this patent?
Zscaler Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/566. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).