Systems and methods for predicting which software vulnerabilities will be exploited by malicious hackers to prioritize for patching

US11892897B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11892897-B2
Application numberUS-201816640878-A
CountryUS
Kind codeB2
Filing dateOct 26, 2018
Priority dateNov 3, 2017
Publication dateFeb 6, 2024
Grant dateFeb 6, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments for predicting which software vulnerabilities will be exploited by malicious hackers and hence prioritized by patching are disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for assessing a likelihood of exploitation of software vulnerabilities, comprising: utilizing a processor in operable communication with at least one memory for storing instructions that are executed by the processor to perform operations, including: accessing a plurality of datasets associated with a predetermined set of data sources, the plurality of datasets including training data comprising hacker communications; accessing features from the plurality of datasets that include measures computed from social connections of users posting hacking-related content applying learning algorithms to the training data to generate classification models that are configured to predict class labels defining a likelihood of exploitation of respective software vulnerabilities; accessing one or more features associated with a software vulnerability; and computing, by applying the one or more features to the classification model, a class label defining one or more values defining a likelihood of exploitation associated with the software vulnerability, wherein the likelihood of exploitation predicts an actual exploitation of the respective software vulnerabilities before disclosure based on the hacker communications from the training data. 2. The method of claim 1 , further comprising generating a plurality of estimation outputs based on the one or more values to derive an overall quantitative score. 3. The method of claim 1 , wherein the plurality of datasets include vulnerability data for vulnerabilities that are publicly disclosed and obtaining exploits data for exploits that were used in real world attacks. 4. The method of claim 3 , further comprising: aligning the exploits data with the vulnerability data; and cleaning the exploits data of noise and predetermined portions of the exploits data that is irrelevant to associated software vulnerabilities. 5. The method of claim 1 , wherein certain features correspond to a known vulnerability obtained from the plurality of datasets. 6. The method of claim 1 , further comprising testing the classification models by applying additional training data and one or more algorithms and evaluation metrics to optimize the classification models until the classification models compute the likelihood of exploitation according to a predefined error rate. 7. The method of claim 1 , further comprising vectorizing text features derived from the plurality of datasets using term frequency-inverse document frequency to create a vocabulary of associated words. 8. The method of claim 1 , further comprising: sorting vulnerabilities associated with the plurality of datasets according to time; training the classification model using the training data, the training data defining a first subset of the plurality of datasets associated with a predetermined period of time; and testing the classification model using a second subset of the plurality of datasets associated with the predetermined period of time. 9. The method of claim 1 , further comprising computing mutual information from the plurality of datasets informative as to what information a given feature provides about another feature. 10. The method of claim 1 , further comprising: detecting, from the plurality of datasets, vulnerabilities that appear before an associated exploitation date. 11. The method of claim 1 , further comprising: accessing features from the plurality of datasets that measure a centrality of the users in a social graph. 12. The method of claim 1 , further comprising: accessing one or more features indicative of temporal connections between at least two of: a time associated with discussion of a vulnerability by users posting hacking-related content at a web forum prior to disclosure to a public vulnerability database; a time associated with disclosure of the vulnerability to the public vulnerability database; and a time associated with exploitation of the vulnerability as obtained through exploits data associated with real-world exploitation of the vulnerability. 13. The method of claim 1 , further comprising vectorizing text features derived from textual content of the plurality of datasets using a predetermined natural language process (NPL). 14. A computing device, configured via machine learning to apply a learned function to data associated with a software vulnerability to estimate a likelihood of exploitation of the software vulnerability, the learned function associated with a prediction model derived from at least one machine learning algorithm and a plurality of datasets associated with software vulnerabilities, the plurality of datasets including information associated with discussion of vulnerabilities by users posting hacking-related content, wherein the likelihood of exploitation predicts an actual exploitation of the software vulnerability before disclosure based on hacker communications from the plurality of datasets. 15. The computing device of claim 14 , wherein the prediction model is at least one classification model that outputs from features of the software vulnerability a score indicative of the likelihood of exploitation. 16. The computing device of claim 14 , wherein the predictive model includes a Random Forest (RF) method including multiple decision tree predictors applied in combination to classify the software vulnerability which is used to estimate the likelihood of exploitation. 17. The computing device of claim 14 , wherein the computing device is further configured to apply the learned function to data associated with each of a plurality of new software vulnerabilities and compute a respective likelihood of exploitation for each of the plurality of new software vulnerabilities. 18. A method of prioritizing vulnerabilities using cyber threat intelligence, comprising: utilizing a processor in operable communication with at least one memory for storing instructions that are executed by the processor to perform operations including: accessing a plurality of datasets associated with a predetermined set of data sources, at least a portion of the plurality of datasets defining training data including hacker communications; applying learning algorithms to the training data to generate a predictive model configured to predict a likelihood of exploitation of respective software vulnerabilities; accessing one or more features associated with a software vulnerability; and computing, by applying the one or more features to the predictive model, one or more values defining a likelihood of exploitation associated with the software vulnerability, wherein the likelihood of exploitation predicts an actual exploitation of the respective software vulnerabilities before disclosure based on the hacker communications from the training data. 19. The method of claim 18 , further comprising generating a plurality of estimation outputs based on the one or more values to derive an overall quantitative score. 20. The method of claim 18 , wherein the plurality of datasets include vulnerability data for vulnerabilities that are publicly disclosed and obtaining exploits data for exploits that were used in real world attacks.

Assignees

Inventors

Classifications

  • G06F11/008Primary

    Reliability or availability analysis · CPC title

  • characterised by the process organisation or structure, e.g. boosting cascade · CPC title

  • Classification techniques · CPC title

  • by adding security routines or objects to programs · CPC title

  • involving long-term monitoring or reporting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11892897B2 cover?
Various embodiments for predicting which software vulnerabilities will be exploited by malicious hackers and hence prioritized by patching are disclosed.
Who is the assignee on this patent?
Univ Arizona State
What technology area does this patent fall under?
Primary CPC classification G06F11/008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).