Malware detection using federated learning

US11593485B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11593485-B1
Application numberUS-202217843062-A
CountryUS
Kind codeB1
Filing dateJun 17, 2022
Priority dateJun 17, 2022
Publication dateFeb 28, 2023
Grant dateFeb 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of generating a predictive model for malware detection using federated learning includes transmitting, to each of a plurality of remote devices, a copy of the predictive model, where the predictive model is configured to predict whether a file is malicious; receiving, from each of the plurality of remote devices, model parameters determined by independently training the copy of the predictive model on each of the plurality of remote devices using local files stored on respective ones of the plurality of remote devices; generating a federated model by training the predictive model based on the model parameters received from each of the plurality of remote devices; and transmitting the federated model to each of the plurality of remote devices.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating a predictive model for malware detection using federated learning, the method comprising: transmitting, to each of a plurality of remote devices, a copy of the predictive model, wherein the predictive model is configured to predict whether a file is malicious; transmitting, to the plurality of remote devices and concurrently with transmitting the copy of the predictive model, a malware properties database, wherein each of the plurality of remote devices uses the malware properties database to detect malicious files prior to independently training the copy of predictive model; receiving, from each of a first subset of the plurality of remote devices, model parameters determined by independently training the copy of the predictive model on each of the first subset of the plurality of remote devices using local files stored on respective ones of the first subset of the plurality of remote devices; generating a federated model by training the predictive model based on the model parameters received from each of the first subset of the plurality of remote devices; and transmitting the federated model to each of the plurality of remote devices. 2. The method of claim 1 , wherein the copy of the predictive model is transmitted to each of the plurality of remote devices during installation of a software application. 3. The method of claim 1 , further comprising: receiving, from each of the first subset of the plurality of remote devices and concurrently with receiving the model parameters, metadata indicating a version of the copy of the predictive model trained on the corresponding one of the first subset of the plurality of remote devices. 4. The method of claim 3 , further comprising: comparing the version of each of copies of the predictive models trained by the first subset of the plurality of remote devices with a current version of the predictive model to determine whether the copy of the predictive model trained by any of the first subset of the plurality of remote devices is out of date. 5. The method of claim 4 , further comprising: comparing a feature set of the at least one out-of-date model with a feature set of the current version of the predictive model responsive to identifying at least one out-of-date model from the copies of the predictive model trained by the plurality of remote devices, wherein the model parameters associated with the at least one out-of-date model are not used to generate the plurality of trained predictive models if the feature set of the at least one out-of-date model does not match the feature set of the current version of the predictive model. 6. The method of claim 1 , further comprising: receiving, from each of a second subset of the plurality of remote devices, second model parameters determined by independently training a previously-stored predictive model on each of the second subset of the plurality of remote devices; receiving, from each of the second subset of the plurality of remote devices, metadata for the previously-stored predictive model associated with each of the second subset of the plurality of remote devices, the metadata indicating at least a version number of the previously-stored predictive model associated with each of the second subset of the plurality of remote devices; and comparing the version number received from each of the second subset of the plurality of remote devices to a version number of a current version of the predictive model to determine whether the previously-stored predictive model trained by any of the second subset of the plurality of remote devices utilizes a different feature set from the current version of the predictive model, wherein: if it determined that the previously-stored predictive model trained by at least one device of the second subset of the plurality of remote devices utilizes a different feature set from the current version of the predictive model, then the second model parameters received from the at least one device are not used to generate the federated model; otherwise, generating the federated model further includes training the predictive model based on the model parameters received from each of the first subset of the plurality of remote devices and the second model parameters not associated with the at least one device. 7. The method of claim 1 , further comprising: receiving, from each of the plurality of remote devices, features extracted from one or more local files that are predicted to be malicious, wherein the features are extracted by each of the plurality of remote devices using either: i) the copy of the predictive model, or ii) the federated model; and updating the malware properties database based on the received features. 8. The method of claim 1 , wherein the malware properties database comprises file characterization information for a plurality of known malicious files. 9. The method of claim 1 , wherein generating the federated model further comprises: generating multiple instances of the federated model based on the model parameters received from each of the plurality of remote devices; and testing each instance of the multiple instances of the federated model to identify a best-performing instance of the federated model, wherein the best-performing instance of the federated model is the transmitted to each of the plurality of remote devices. 10. The method of claim 1 , wherein the federated learning model is generated by training the predictive model using one or more federated learning techniques, including at least one of federated stochastic gradient descent, federated averaging, or dynamic regularization. 11. A malware detection system comprising: one or more processors; and memory having instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: generating an initial instantiation of a model for predicting whether a file is malicious; transmitting, to a plurality of remote devices, the initial instantiation of the model and a malware properties database, wherein the malware properties database comprises a set of hashes of known malicious files; receiving, from each of the plurality of remote devices, parameters for the model, wherein the parameters for the model are determined by independently training the initial instantiation of the model on each of the plurality of remote devices using the malware properties database and local files stored on respective ones of the plurality of remote devices; generating a federated model by training the model using the parameters for the model received from each of the plurality of remote devices; and transmitting the federated model to each of the plurality of remote devices. 12. The system of claim 11 , wherein each of the plurality of remote devices uses the malware properties database to detect malicious files prior to independently training the initial instantiation of model. 13. The system of claim 11 , further comprising: receiving, from each of the plurality of remote devices and concurrently with receiving the parameters for the model, metadata indicating a version of the model trained on the corresponding one of the plurality of remote devices. 14. The system of claim 13 , further comprising: comparing the version of the model trained by each of the plurality of remote devices with a current version of the model to determine whether the model trained by any of the plurality of remote devices is out of date. 15. The system of claim 14 , further comprising: comparing a feature set of t

Assignees

Inventors

Classifications

  • by checking file integrity · CPC title

  • G06F21/564Primary

    by virus signature recognition · CPC title

  • G06F21/566Primary

    Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title

  • Test or assess a computer or a system · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11593485B1 cover?
A method of generating a predictive model for malware detection using federated learning includes transmitting, to each of a plurality of remote devices, a copy of the predictive model, where the predictive model is configured to predict whether a file is malicious; receiving, from each of the plurality of remote devices, model parameters determined by independently training the copy of the pre…
Who is the assignee on this patent?
Uab 360 It
What technology area does this patent fall under?
Primary CPC classification G06F21/564. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).