Malware detection by distributed telemetry data analysis
US-2022114260-A1 · Apr 14, 2022 · US
US11593485B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11593485-B1 |
| Application number | US-202217843062-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 17, 2022 |
| Priority date | Jun 17, 2022 |
| Publication date | Feb 28, 2023 |
| Grant date | Feb 28, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of generating a predictive model for malware detection using federated learning includes transmitting, to each of a plurality of remote devices, a copy of the predictive model, where the predictive model is configured to predict whether a file is malicious; receiving, from each of the plurality of remote devices, model parameters determined by independently training the copy of the predictive model on each of the plurality of remote devices using local files stored on respective ones of the plurality of remote devices; generating a federated model by training the predictive model based on the model parameters received from each of the plurality of remote devices; and transmitting the federated model to each of the plurality of remote devices.
Opening claim text (preview).
What is claimed is: 1. A method of generating a predictive model for malware detection using federated learning, the method comprising: transmitting, to each of a plurality of remote devices, a copy of the predictive model, wherein the predictive model is configured to predict whether a file is malicious; transmitting, to the plurality of remote devices and concurrently with transmitting the copy of the predictive model, a malware properties database, wherein each of the plurality of remote devices uses the malware properties database to detect malicious files prior to independently training the copy of predictive model; receiving, from each of a first subset of the plurality of remote devices, model parameters determined by independently training the copy of the predictive model on each of the first subset of the plurality of remote devices using local files stored on respective ones of the first subset of the plurality of remote devices; generating a federated model by training the predictive model based on the model parameters received from each of the first subset of the plurality of remote devices; and transmitting the federated model to each of the plurality of remote devices. 2. The method of claim 1 , wherein the copy of the predictive model is transmitted to each of the plurality of remote devices during installation of a software application. 3. The method of claim 1 , further comprising: receiving, from each of the first subset of the plurality of remote devices and concurrently with receiving the model parameters, metadata indicating a version of the copy of the predictive model trained on the corresponding one of the first subset of the plurality of remote devices. 4. The method of claim 3 , further comprising: comparing the version of each of copies of the predictive models trained by the first subset of the plurality of remote devices with a current version of the predictive model to determine whether the copy of the predictive model trained by any of the first subset of the plurality of remote devices is out of date. 5. The method of claim 4 , further comprising: comparing a feature set of the at least one out-of-date model with a feature set of the current version of the predictive model responsive to identifying at least one out-of-date model from the copies of the predictive model trained by the plurality of remote devices, wherein the model parameters associated with the at least one out-of-date model are not used to generate the plurality of trained predictive models if the feature set of the at least one out-of-date model does not match the feature set of the current version of the predictive model. 6. The method of claim 1 , further comprising: receiving, from each of a second subset of the plurality of remote devices, second model parameters determined by independently training a previously-stored predictive model on each of the second subset of the plurality of remote devices; receiving, from each of the second subset of the plurality of remote devices, metadata for the previously-stored predictive model associated with each of the second subset of the plurality of remote devices, the metadata indicating at least a version number of the previously-stored predictive model associated with each of the second subset of the plurality of remote devices; and comparing the version number received from each of the second subset of the plurality of remote devices to a version number of a current version of the predictive model to determine whether the previously-stored predictive model trained by any of the second subset of the plurality of remote devices utilizes a different feature set from the current version of the predictive model, wherein: if it determined that the previously-stored predictive model trained by at least one device of the second subset of the plurality of remote devices utilizes a different feature set from the current version of the predictive model, then the second model parameters received from the at least one device are not used to generate the federated model; otherwise, generating the federated model further includes training the predictive model based on the model parameters received from each of the first subset of the plurality of remote devices and the second model parameters not associated with the at least one device. 7. The method of claim 1 , further comprising: receiving, from each of the plurality of remote devices, features extracted from one or more local files that are predicted to be malicious, wherein the features are extracted by each of the plurality of remote devices using either: i) the copy of the predictive model, or ii) the federated model; and updating the malware properties database based on the received features. 8. The method of claim 1 , wherein the malware properties database comprises file characterization information for a plurality of known malicious files. 9. The method of claim 1 , wherein generating the federated model further comprises: generating multiple instances of the federated model based on the model parameters received from each of the plurality of remote devices; and testing each instance of the multiple instances of the federated model to identify a best-performing instance of the federated model, wherein the best-performing instance of the federated model is the transmitted to each of the plurality of remote devices. 10. The method of claim 1 , wherein the federated learning model is generated by training the predictive model using one or more federated learning techniques, including at least one of federated stochastic gradient descent, federated averaging, or dynamic regularization. 11. A malware detection system comprising: one or more processors; and memory having instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: generating an initial instantiation of a model for predicting whether a file is malicious; transmitting, to a plurality of remote devices, the initial instantiation of the model and a malware properties database, wherein the malware properties database comprises a set of hashes of known malicious files; receiving, from each of the plurality of remote devices, parameters for the model, wherein the parameters for the model are determined by independently training the initial instantiation of the model on each of the plurality of remote devices using the malware properties database and local files stored on respective ones of the plurality of remote devices; generating a federated model by training the model using the parameters for the model received from each of the plurality of remote devices; and transmitting the federated model to each of the plurality of remote devices. 12. The system of claim 11 , wherein each of the plurality of remote devices uses the malware properties database to detect malicious files prior to independently training the initial instantiation of model. 13. The system of claim 11 , further comprising: receiving, from each of the plurality of remote devices and concurrently with receiving the parameters for the model, metadata indicating a version of the model trained on the corresponding one of the plurality of remote devices. 14. The system of claim 13 , further comprising: comparing the version of the model trained by each of the plurality of remote devices with a current version of the model to determine whether the model trained by any of the plurality of remote devices is out of date. 15. The system of claim 14 , further comprising: comparing a feature set of t
by checking file integrity · CPC title
by virus signature recognition · CPC title
Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title
Test or assess a computer or a system · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.