Representing and comparing files based on segmented similarity
US-2017193230-A1 · Jul 6, 2017 · US
US10169581B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10169581-B2 |
| Application number | US-201615249702-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 29, 2016 |
| Priority date | Aug 29, 2016 |
| Publication date | Jan 1, 2019 |
| Grant date | Jan 1, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A training data set for training a machine learning module is prepared by dividing normal files and malicious files into sections. Each section of a normal file is labeled as normal. Each section of a malicious file is labeled as malicious regardless of whether or not the section is malicious. The sections of the normal files and malicious files are used to train the machine learning module. The trained machine learning module is packaged as a machine learning model, which is provided to an endpoint computer. In the endpoint computer, an unknown file is divided into sections, which are input to the machine learning model to identify a malicious section of the unknown file, if any is present in the unknown file.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method of evaluating a file for malicious code, the method comprising: receiving a plurality of normal files and a plurality of malicious files; dividing each of the normal files and each of the malicious files into a plurality of file sections; labeling each file section of the normal files as a normal file section; labeling each file section of the malicious files as a malicious file section; generating a machine learning model using a machine learning training data set comprising the labeled file sections of the normal files and the malicious files; and using the machine learning model to identify which particular section of a target file contains malicious code. 2. The computer-implemented method of claim 1 , wherein using the machine learning model to identify which particular section of the target fie contains malicious code comprises: dividing the target file into a plurality of sections; and using the machine learning model to classify each of the sections of the target file. 3. The computer-implemented method of claim 1 , wherein the machine learning model is generated by training a Support Vector Machine using the training data set. 4. The computer-implemented method of claim 1 , further comprising: providing the machine learning model to an endpoint computer system over a computer network, wherein the endpoint computer system receives the target file over the computer network and classifies individual sections of the target file using the machine learning model. 5. The computer-implemented method of claim 1 , wherein the normal files, the malicious files, and the target file are executable files. 6. The computer-implemented method of claim 1 , wherein the normal files, the malicious files, and the target file are in Portable Executable format. 7. A system for evaluating files for malicious code, the system comprising: a backend computer system that is configured to divide each of a plurality of normal files into file sections, divide each of a plurality of malicious files into file sections, label each file section of the normal files as a normal file section, label each file section of the malicious files as a malicious file section, and generate a machine learning model using a machine learning training data set comprising labeled file sections of the normal files and the malicious files; and an endpoint computer that is configured to receive the machine learning model over a computer network, receive a target file, and use the machine learning model to identify which particular section of the target file contains malicious code. 8. The system of claim 7 , wherein the endpoint computer divides the target file into a plurality of sections and inputs the sections of the target file into the machine learning model. 9. The system of claim 7 , wherein the backend computer system generates the machine learning model by training a Support Vector Machine using the training data set. 10. The system of claim 7 , wherein the normal files, the malicious files, and the target file are executable files. 11. The system of claim 7 , wherein the normal files, the malicious files, and the target file are in Portable Executable format. 12. The system of claim 7 , wherein the endpoint computer divides the target file into a plurality of sections and inputs the sections of the target file into the machine learning model. 13. A non-transitory computer-readable medium comprising instructions stored thereon, that when executed by a processor, perform the steps of: dividing each of a plurality of normal files and each of a plurality of malicious files into a plurality of file sections; labeling each file section of the normal files as a normal file section; labeling each file section of the malicious files as a malicious file section; generating a machine learning model using a machine learning training data set comprising labeled file sections of the normal files and the malicious files; and providing the machine learning model to an endpoint computer system to detect malicious files in the endpoint computer system. 14. The non-transitory computer-readable medium of claim 13 , wherein the machine learning model is generated by training a Support Vector Machine using the training data set. 15. The non-transitory computer-readable medium of claim 13 , wherein the normal files and the malicious files are executable files. 16. The non-transitory computer-readable medium of claim 13 , wherein the normal files and the malicious files are in Portable Executable format. 17. The non-transitory computer-readable medium of claim 13 , wherein the machine learning model is provided to the endpoint computer system over the Internet.
Physics · mapped topic
Static detection · CPC title
Test or assess a computer or a system · CPC title
Computer malware detection or handling, e.g. anti-virus arrangements · CPC title
using kernel methods, e.g. support vector machines [SVM] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.