Automated detection of malicious packages in a software repository

US12061697B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12061697-B2
Application numberUS-202217673142-A
CountryUS
Kind codeB2
Filing dateFeb 16, 2022
Priority dateFeb 16, 2022
Publication dateAug 13, 2024
Grant dateAug 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Detecting a malicious package associated with a software repository. A method identifies a subject package in a software repository, and extracts a feature set from the subject package. The feature set includes single-version features, including whether the subject package accesses personally identifying information, accesses specified system resource(s), uses specified application programming interface(s), includes installation script(s), and/or includes a binary, minified, or obfuscated file. The feature set also includes change features, including an amount of time since publication of a prior version of the subject package, a semantic update type, and/or how single-version feature(s) have changed since the prior version. The method provides the feature set as input to a set of classifiers, each being configured to use the feature set to generate a prediction of whether the subject package is malicious or benign. Based at least on the prediction, the method classifiers the subject package as being malicious or benign.

First claim

Opening claim text (preview).

What is claimed: 1. A method, implemented at a computer system that includes a processor, for detecting a malicious package associated with a software repository, the method comprising: identifying a subject package associated with a software repository, wherein the subject package is an initial version of the subject package; extracting a feature set from the subject package, the feature set including: (a) one or more single-version features, including one or more of (i) whether the subject package accesses personally identifying information (PII), (ii) whether the subject package accesses a specified system resource, (iii) whether the subject package uses a specified application programming interface (API), (iv) whether the subject package includes an installation script, or (v) whether the subject package includes at least one of a binary file, a minified file, or an obfuscated file, and (b) one or more change features, including one or more of (i) an amount of time since publication of a prior version of the subject package, (ii) a semantic update type associated with the subject package, or (iii) an identification of how one or more single-version features have changed since the prior version of the subject package, wherein the prior version of the subject package is treated as an empty package; providing the feature set as input to a set of classifiers, each classifier in the set of classifiers being configured to use the feature set to generate a prediction of whether the subject package is malicious or benign; and based at least on the prediction, classifying the subject package as being malicious or benign. 2. The method of claim 1 , wherein the one or more single-version features include whether the subject package accesses PII. 3. The method of claim 1 , wherein the one or more single-version features include whether the subject package accesses a specified system resource, and wherein the specified system resource includes at least one of: (a) file system access, (b) process creation, or (c) network access. 4. The method of claim 1 , wherein the one or more single-version features include whether the subject package uses a specified API, and wherein the specified API includes at least one of: (a) a cryptographic API, (b) a data encoding API, or (c) a dynamic code generation API. 5. The method of claim 1 , wherein the one or more single-version features include whether the subject package includes an installation script. 6. The method of claim 1 , wherein the one or more single-version features include whether the subject package includes at least one of a binary file, a minified file, or an obfuscated file. 7. The method of claim 1 , wherein the one or more change features include the amount of time since publication of the prior version of the subject package. 8. The method of claim 1 , wherein the one or more change features include the semantic update type associated with the subject package. 9. The method of claim 1 , further comprising, based at least on classifying the subject package as being malicious, automatically removing the subject package from the software repository. 10. The method of claim 1 , further comprising, based at least on classifying the subject package as being malicious, attaching a priority to the subject package. 11. The method of claim 1 , further comprising: updating a training data set based on the feature set and the prediction of whether the subject package is malicious or benign; and re-training the set of classifiers using the updated training data set. 12. The method of claim 1 , further comprising, based at least on a prediction that subject package is malicious: creating a reproduced build of the subject package from source; determining if the reproduced build of the subject package is equivalent to the subject package; and classifying the subject package as being benign when the reproduced build of the subject package is equivalent to the subject package. 13. The method of claim 1 , further comprising, based at least on a prediction that subject package is benign: determining whether the subject package is a clone of another package in the software repository; and classifying the subject package as being malicious when the subject package is determined to be a clone of another package in the software repository. 14. The method of claim 1 , wherein the set of classifiers include a decision tree classifier, a Naive Bayesian classifier, and a one-class support vector machine classifier. 15. The method of claim 1 , wherein the software repository is one of a code library repository, an application distribution repository, a source code repository, or a container repository. 16. The method of claim 1 , wherein treating the prior version of the subject package as an empty package includes at least one of, determining that the amount of time since publication of the prior version of the subject package is zero, determining that the semantic update type associated with the subject package is a pseudo-update type representing a first version of the subject package, or determining that the identification of how one or more single-version features have changed since the prior version of the subject package includes identifying a single-version feature, itself. 17. A computer system for detecting a malicious package associated with a software repository, comprising: a processor; and a computer storage media that stores computer-executable instructions that are executable by the processor to cause the computer system to at least: identify a subject package associated with a software repository, wherein the subject package is an initial version of the subject package; extract a feature set from the subject package, the feature set including: (a) one or more single-version features, including one or more of (i) whether the subject package accesses personally identifying information (PII), (ii) whether the subject package accesses a specified system resource, (iii) whether the subject package uses a specified application programming interface (API), (iv) whether the subject package includes an installation script, or (v) whether the subject package includes at least one of a binary file, a minified file, or an obfuscated file, and (b) one or more change features, including one or more of (i) an amount of time since publication of a prior version of the subject package, or (ii) a semantic update type associated with the subject package, wherein the prior version of the subject package is treated as an empty package; provide the feature set as input to a set of classifiers, each classifier in the set of classifiers being configured to use the feature set to generate a prediction of whether the subject package is malicious or benign; and based at least on the prediction, classify the subject package as being malicious or benign. 18. The computer system of claim 17 , the computer-executable instructions also including instructions that are executable by the processor to cause the computer system to, based at least on a prediction the subject package is malicious: create a reproduced build of the subject package from source; determine if the reproduced build of the subject package is equivalent to the subject package; and classify the subject package as being benign when the reproduced build of the subject package is equivalent to the subject package. 19. The computer system of claim 17 , the computer-executable instructions also including instructions that are executa

Assignees

Inventors

Classifications

  • Test or assess software · CPC title

  • for detecting or protecting against malicious traffic · CPC title

  • Machine learning · CPC title

  • Computer malware detection or handling, e.g. anti-virus arrangements · CPC title

  • G06F21/563Primary

    by source code analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12061697B2 cover?
Detecting a malicious package associated with a software repository. A method identifies a subject package in a software repository, and extracts a feature set from the subject package. The feature set includes single-version features, including whether the subject package accesses personally identifying information, accesses specified system resource(s), uses specified application programming …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F21/563. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).