What technology area does this patent fall under?

Primary CPC classification G06F21/56. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 06 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Cybersecurity system evaluation and configuration

US12292971B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12292971-B2
Application number	US-202117524930-A
Country	US
Kind code	B2
Filing date	Nov 12, 2021
Priority date	Nov 13, 2020
Publication date	May 6, 2025
Grant date	May 6, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Statistical properties of known malware distributions may be used to improve estimates of malware detection metrics such as a base rate of malicious events in a target environment or missed detections (also referred to as false negatives). In particular, numerous synthetic sample distributions may be generated based on the statistical properties of a base data set and/or additional observed data, and used to identify malware distributions that produce overall detection statistics corresponding to model output for live target data. The malware detection metrics for the live target data can then be characterized using the observed distributions of malware (and malware detections) for the synthetic sample distributions.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product comprising computer executable code embodied in a non-transitory computer readable medium that, when executing on one or more computing devices, performs the steps of: evaluating a true positive rate and a false positive rate for a malware detection system, the true positive rate corresponding to an accurate detection of malware by the malware detection system in a base data set and the false positive rate corresponding to an erroneous detection of malware in the base data set by the malware detection system, the base data set labeled with a known composition of malicious code instances and the base data set having a base rate of malware instances; applying the malware detection system to a new data set to determine a first number of detections within the new data set; generating a number of synthetic data sets with an estimation engine based on a distribution of malware instances within the base data set; selecting a representative group from the number of synthetic data sets that produce a corresponding set of numbers of detection when analyzed with the malware detection system similar to the first number of detections produced within the new data set when analyzed with the malware detection system, wherein the corresponding set of numbers of detection are each within a predetermined threshold of the first number of detections, and wherein the predetermined threshold is a relative threshold scaled according to a ratio of a size of the new data set to the size of each of the synthetic data sets; and determining a malware detection metric for the new data set based on a statistical composition of the representative group. 2. The computer program product of claim 1 , wherein the predetermined threshold is an absolute numerical threshold. 3. The computer program product of claim 1 , wherein the new data set includes live samples analyzed for an enterprise by the malware detection system. 4. The computer program product of claim 1 , wherein evaluating the true positive rate and the false positive rate for the malware detection system includes measuring the true positive rate and the false positive rate for the malware detection system when applied to a base data set having a known composition of malware instances and benign instances. 5. The computer program product of claim 1 , wherein the malware detection system is a machine learning model trained to detect malware based on a training data set, further wherein each software instance in the training data set is labeled to indicate a malware status. 6. The computer program product of claim 1 , further comprising computer executable code that, when executed, performs the step of updating the true positive rate and the false positive rate based on additional software instances received by the malware detection system and automatically labeled by the malware detection system as safe or malicious. 7. A method comprising: evaluating a true detection rate and a false detection rate for a malware detection system when applied to a base data set having a known composition of malicious code instances; applying the malware detection system to a new data set to determine a first detection rate for the new data set; generating a number of synthetic data sets based on one or more properties of the base data set; selecting a representative group from the number of synthetic data sets that produce a corresponding detection rate when analyzed with the malware detection system similar to the first detection rate within the new data set when analyzed with the malware detection system; and determining a malware detection metric for the new data set based on a statistical composition of the representative group selected from the number of synthetic data sets, wherein the malware detection metric includes at least one of a probability distribution for an estimated base rate of malware instances for the new data set and a confidence interval for the estimated base rate of malware instances. 8. The method of claim 7 , further comprising adjusting a security parameter used by a threat management facility to manage security of an enterprise network based on the malware detection metric. 9. The method of claim 7 , wherein the malware detection metric includes an estimated base rate of malware instances for the new data set. 10. The method of claim 7 , wherein the malware detection metric includes at least one of an estimated true positive rate for the new data set and an estimated false positive for the new data set. 11. The method of claim 7 , wherein the malware detection metric includes an estimated number of missed detections for the new data set. 12. The method of claim 7 , wherein the malware detection metric includes a ratio of true positives to false positives for the new data set. 13. The method of claim 7 , wherein the malware detection system includes a machine learning model trained to detect malware based on a training data set. 14. The method of claim 13 , wherein each software instance in the training data set is labeled to indicate a malware status. 15. A system comprising: a memory storing a detection model having a true detection rate and a false detection rate for identifying malware when applied to a base data set having a known malware composition; a malware detection system configured to apply the detection model to a new data set to determine a rate of malware first number of detections occurring within the new data set; an estimation engine configured to synthesize a number of synthetic data sets based on properties of the base data set, and to select a representative group from the number of synthetic data sets that produces a similar rate of malware when analyzed with the malware detection system to the new data set when analyzed with the malware detection system, wherein the estimation engine synthesizes the number of synthetic data sets using a Sequential Monte Carlo simulation to randomly draw samples from the base data set and beta-weighting a result with an increasing beta until an explained sum of squares is within a predetermined threshold of a target; and a scoring engine to calculate one or more malware metrics for the new data set based on the representative group. 16. The system of claim 15 , wherein the estimation engine synthesizes the number of synthetic data sets using a Metropolis-Hastings algorithm to randomly draw candidates from a proposal distribution and conditionally include each randomly drawn candidate using a probability function. 17. The system of claim 15 , wherein the detection model includes a machine learning model trained to detect malware using malware labels for a training data set. 18. The system of claim 15 , further comprising a threat management facility configured to adjust a tuning parameter to control a sensitivity for detection of or response to threats based on the one or more malware metrics. 19. The system of claim 15 , wherein the one or more malware metrics for the new data set includes an estimated true positive rate for the new data set. 20. The system of claim 15 , wherein the one or more malware metrics for the new data set includes an estimated false positive rate for the new data set.

Assignees

Sophos Ltd

Inventors

Harang Richard Edward

Classifications

H04L63/20
for managing network security; network security policies in general (filtering policies H04L63/0227) · CPC title
G06N20/00
Machine learning · CPC title
G06F2221/034
Test or assess a computer or a system · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06F21/56Primary
Computer malware detection or handling, e.g. anti-virus arrangements · CPC title

Patent family

Related publications grouped by family.

View patent family 81587139

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12292971B2 cover?: Statistical properties of known malware distributions may be used to improve estimates of malware detection metrics such as a base rate of malicious events in a target environment or missed detections (also referred to as false negatives). In particular, numerous synthetic sample distributions may be generated based on the statistical properties of a base data set and/or additional observed dat…
Who is the assignee on this patent?: Sophos Ltd
What technology area does this patent fall under?: Primary CPC classification G06F21/56. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 06 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Automatic threat detection of executable files based on static data analysis

Multi-representational learning models for static analysis of source code

Automatic threat detection of executable files based on static data analysis

Automatically grouping malware based on artifacts

System and method of machine learning of malware detection model

Frequently asked questions