Automated detection of malware using trained neural network-based file classifiers and machine learning
US-2021234880-A1 · Jul 29, 2021 · US
US12061697B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12061697-B2 |
| Application number | US-202217673142-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 16, 2022 |
| Priority date | Feb 16, 2022 |
| Publication date | Aug 13, 2024 |
| Grant date | Aug 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Detecting a malicious package associated with a software repository. A method identifies a subject package in a software repository, and extracts a feature set from the subject package. The feature set includes single-version features, including whether the subject package accesses personally identifying information, accesses specified system resource(s), uses specified application programming interface(s), includes installation script(s), and/or includes a binary, minified, or obfuscated file. The feature set also includes change features, including an amount of time since publication of a prior version of the subject package, a semantic update type, and/or how single-version feature(s) have changed since the prior version. The method provides the feature set as input to a set of classifiers, each being configured to use the feature set to generate a prediction of whether the subject package is malicious or benign. Based at least on the prediction, the method classifiers the subject package as being malicious or benign.
Opening claim text (preview).
What is claimed: 1. A method, implemented at a computer system that includes a processor, for detecting a malicious package associated with a software repository, the method comprising: identifying a subject package associated with a software repository, wherein the subject package is an initial version of the subject package; extracting a feature set from the subject package, the feature set including: (a) one or more single-version features, including one or more of (i) whether the subject package accesses personally identifying information (PII), (ii) whether the subject package accesses a specified system resource, (iii) whether the subject package uses a specified application programming interface (API), (iv) whether the subject package includes an installation script, or (v) whether the subject package includes at least one of a binary file, a minified file, or an obfuscated file, and (b) one or more change features, including one or more of (i) an amount of time since publication of a prior version of the subject package, (ii) a semantic update type associated with the subject package, or (iii) an identification of how one or more single-version features have changed since the prior version of the subject package, wherein the prior version of the subject package is treated as an empty package; providing the feature set as input to a set of classifiers, each classifier in the set of classifiers being configured to use the feature set to generate a prediction of whether the subject package is malicious or benign; and based at least on the prediction, classifying the subject package as being malicious or benign. 2. The method of claim 1 , wherein the one or more single-version features include whether the subject package accesses PII. 3. The method of claim 1 , wherein the one or more single-version features include whether the subject package accesses a specified system resource, and wherein the specified system resource includes at least one of: (a) file system access, (b) process creation, or (c) network access. 4. The method of claim 1 , wherein the one or more single-version features include whether the subject package uses a specified API, and wherein the specified API includes at least one of: (a) a cryptographic API, (b) a data encoding API, or (c) a dynamic code generation API. 5. The method of claim 1 , wherein the one or more single-version features include whether the subject package includes an installation script. 6. The method of claim 1 , wherein the one or more single-version features include whether the subject package includes at least one of a binary file, a minified file, or an obfuscated file. 7. The method of claim 1 , wherein the one or more change features include the amount of time since publication of the prior version of the subject package. 8. The method of claim 1 , wherein the one or more change features include the semantic update type associated with the subject package. 9. The method of claim 1 , further comprising, based at least on classifying the subject package as being malicious, automatically removing the subject package from the software repository. 10. The method of claim 1 , further comprising, based at least on classifying the subject package as being malicious, attaching a priority to the subject package. 11. The method of claim 1 , further comprising: updating a training data set based on the feature set and the prediction of whether the subject package is malicious or benign; and re-training the set of classifiers using the updated training data set. 12. The method of claim 1 , further comprising, based at least on a prediction that subject package is malicious: creating a reproduced build of the subject package from source; determining if the reproduced build of the subject package is equivalent to the subject package; and classifying the subject package as being benign when the reproduced build of the subject package is equivalent to the subject package. 13. The method of claim 1 , further comprising, based at least on a prediction that subject package is benign: determining whether the subject package is a clone of another package in the software repository; and classifying the subject package as being malicious when the subject package is determined to be a clone of another package in the software repository. 14. The method of claim 1 , wherein the set of classifiers include a decision tree classifier, a Naive Bayesian classifier, and a one-class support vector machine classifier. 15. The method of claim 1 , wherein the software repository is one of a code library repository, an application distribution repository, a source code repository, or a container repository. 16. The method of claim 1 , wherein treating the prior version of the subject package as an empty package includes at least one of, determining that the amount of time since publication of the prior version of the subject package is zero, determining that the semantic update type associated with the subject package is a pseudo-update type representing a first version of the subject package, or determining that the identification of how one or more single-version features have changed since the prior version of the subject package includes identifying a single-version feature, itself. 17. A computer system for detecting a malicious package associated with a software repository, comprising: a processor; and a computer storage media that stores computer-executable instructions that are executable by the processor to cause the computer system to at least: identify a subject package associated with a software repository, wherein the subject package is an initial version of the subject package; extract a feature set from the subject package, the feature set including: (a) one or more single-version features, including one or more of (i) whether the subject package accesses personally identifying information (PII), (ii) whether the subject package accesses a specified system resource, (iii) whether the subject package uses a specified application programming interface (API), (iv) whether the subject package includes an installation script, or (v) whether the subject package includes at least one of a binary file, a minified file, or an obfuscated file, and (b) one or more change features, including one or more of (i) an amount of time since publication of a prior version of the subject package, or (ii) a semantic update type associated with the subject package, wherein the prior version of the subject package is treated as an empty package; provide the feature set as input to a set of classifiers, each classifier in the set of classifiers being configured to use the feature set to generate a prediction of whether the subject package is malicious or benign; and based at least on the prediction, classify the subject package as being malicious or benign. 18. The computer system of claim 17 , the computer-executable instructions also including instructions that are executable by the processor to cause the computer system to, based at least on a prediction the subject package is malicious: create a reproduced build of the subject package from source; determine if the reproduced build of the subject package is equivalent to the subject package; and classify the subject package as being benign when the reproduced build of the subject package is equivalent to the subject package. 19. The computer system of claim 17 , the computer-executable instructions also including instructions that are executa
Test or assess software · CPC title
for detecting or protecting against malicious traffic · CPC title
Machine learning · CPC title
Computer malware detection or handling, e.g. anti-virus arrangements · CPC title
by source code analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.