Systems and methods for identifying potentially malicious singleton files

US9959407B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9959407-B1
Application numberUS-201615071049-A
CountryUS
Kind codeB1
Filing dateMar 15, 2016
Priority dateMar 15, 2016
Publication dateMay 1, 2018
Grant dateMay 1, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for identifying potentially malicious singleton files may include (1) identifying a set of benign singleton files and a set of malicious singleton files, (2) obtaining, for each singleton file in the sets of benign and malicious singleton files, file identification information that identifies the singleton file, (3) using the file identification information of the singleton files from the sets of benign and malicious singleton files to train a classifier to classify unknown singleton files, (4) detecting an unclassified singleton file, (5) analyzing, with the trained classifier, information that identifies the unclassified singleton file, (6) determining, based on the analysis of the information that identifies the unclassified singleton file, that the unclassified singleton file is suspicious, and (7) triggering a security action in response to determining that the unclassified singleton file is suspicious. Various other methods, systems, and computer-readable media are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for identifying potentially malicious singleton files, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising: identifying a set of benign singleton files and a set of malicious singleton files; obtaining, for each singleton file in the sets of benign and malicious singleton files, file identification information that identifies the singleton file; using the file identification information of the singleton files from the sets of benign and malicious singleton files to train a classifier to classify unknown singleton files; detecting an unclassified singleton file; analyzing, with the trained classifier, information that identifies the unclassified singleton file; determining, based on the analysis of the information that identifies the unclassified singleton file, that the unclassified singleton file is suspicious; triggering a security action in response to determining that the unclassified singleton file is suspicious. 2. The method of claim 1 , wherein identifying the sets of benign and malicious singleton files comprises at least one of: filtering representative samples of benign singleton files to obtain a comparable set size of benign singleton files to malicious singleton files; filtering the representative samples of benign singleton files to obtain a smaller set size of benign singleton files than malicious singleton files. 3. The method of claim 2 , wherein the representative samples of benign singleton files comprise calculated centroid values of clusters of benign singleton files. 4. The method of claim 1 , wherein the singleton file within the sets of benign and malicious singleton files comprises a file different from any other file stored within a plurality of computing devices. 5. The method of claim 1 , wherein the file identification information that identifies the singleton file comprises at least one of: a filename; an inverse of the filename; a file path; a size of the file; a file header; a file entropy; an external library that the singleton file uses; a function imported by the singleton file; a computing device on which the singleton file resides. 6. The method of claim 1 , wherein using the file identification information to train the classifier comprises: deriving features from the file identification information; using a machine learning model to classify the features derived from the file identification information. 7. The method of claim 1 , wherein analyzing the information that identifies the unclassified singleton file comprises: converting the information that identifies the unclassified singleton file into at least one feature; using the trained classifier to classify the unclassified singleton file based on the feature. 8. The method of claim 1 , wherein determining that the unclassified singleton file is suspicious comprises at least one of: determining that the unclassified singleton file is classified as malicious using the trained classifier; determining that the information that identifies the unclassified singleton file is similar to a malicious singleton file in the set of malicious singleton files. 9. The method of claim 1 , wherein the security action comprises at least one of: triggering an alert that the unclassified singleton file is suspicious; confirming that the unclassified singleton file is malicious; removing the unclassified singleton file from a computing device on which the unclassified singleton file resides. 10. The method of claim 9 , further comprising adding the unclassified singleton file to the set of malicious singleton files in response to confirming that the unclassified singleton file is malicious. 11. A system for identifying potentially malicious singleton files, the system comprising: at least one physical processor; and a system memory having stored therein one or more computer-executable instructions that, when executed by the at least one physical processor, cause the system to perform the following: identify a set of benign singleton files and a set of malicious singleton files; obtain, for each singleton file in the sets of benign and malicious singleton files, file identification information that identifies the singleton file; use the file identification information of the singleton files from the sets of benign and malicious singleton files to train a classifier to classify unknown singleton files; detect an unclassified singleton file; analyze, with the trained classifier, information that identifies the unclassified singleton file; determine, based on the analysis of the information that identifies the unclassified singleton file, that the unclassified singleton file is suspicious; and trigger a security action in response to determining that the unclassified singleton file is suspicious. 12. The system of claim 11 , wherein the system identifies the sets of benign and malicious singleton files by at least one of: filtering representative samples of benign singleton files to obtain a comparable set size of benign singleton files to malicious singleton files; filtering the representative samples of benign singleton files to obtain a smaller set size of benign singleton files than malicious singleton files. 13. The system of claim 12 , wherein the representative samples of benign singleton files comprise calculated centroid values of clusters of benign singleton files. 14. The system of claim 11 , wherein the singleton file within the sets of benign and malicious singleton files comprises a file different from any other file stored within a plurality of computing devices. 15. The system of claim 11 , wherein the file identification information that identifies the singleton file comprises at least one of: a filename; an inverse of the filename; a file path; a size of the file; a file header; a file entropy; an external library that the singleton file uses; a function imported by the singleton file; a computing device on which the singleton file resides. 16. The system of claim 11 , wherein the system uses the file identification information to train the classifier by: deriving features from the file identification information; using a machine learning model to classify the features derived from the file identification information. 17. The system of claim 11 , wherein the system analyzes the information that identifies the unclassified singleton file by: converting the information that identifies the unclassified singleton file into at least one feature; using the trained classifier to classify the unclassified singleton file based on the feature. 18. The system of claim 11 , wherein the system determines that the unclassified singleton file is suspicious by at least one of: determining that the unclassified singleton file is classified as malicious using the trained classifier; determining that the information that identifies the unclassified singleton file is similar to a malicious singleton file in the set of malicious singleton files. 19. The system of claim 11 , wherein the security action comprises at least one of: triggering an alert that the unclassified singleton file is suspicious; confirming that the unclassified singleton file is malicious; removing the unclassified singleton file from a computing device on which the unclassified singleton file resides. 20. A non-transitory computer-readable medium comprising o

Assignees

Inventors

Classifications

  • G06F21/56Primary

    Computer malware detection or handling, e.g. anti-virus arrangements · CPC title

  • Physics · mapped topic

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9959407B1 cover?
A computer-implemented method for identifying potentially malicious singleton files may include (1) identifying a set of benign singleton files and a set of malicious singleton files, (2) obtaining, for each singleton file in the sets of benign and malicious singleton files, file identification information that identifies the singleton file, (3) using the file identification information of the …
Who is the assignee on this patent?
Symantec Corp
What technology area does this patent fall under?
Primary CPC classification G06F21/56. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).