Model development and application to identify and halt malware
US-2019332769-A1 · Oct 31, 2019 · US
US12348560B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12348560-B2 |
| Application number | US-202217734956-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 2, 2022 |
| Priority date | Apr 25, 2022 |
| Publication date | Jul 1, 2025 |
| Grant date | Jul 1, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The detection of phishing Portable Document Format (PDF) files using an image-based deep learning approach is disclosed. A PDF document that includes a Universal Resource Locator is received. A likelihood that the received PDF document represents a phishing threat is determined, at least in part, by using an image based model. A verdict for the PDF document is provided as output based at least in part on the determined likelihood.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a processor configured to: receive a Portable Document Format (PDF) document in response to a determination having been made that the PDF document includes at least one clickable link to a Uniform Resource Locator (URL); determine a likelihood that the received PDF document represents a phishing threat, at least in part, using an image-based model that was previously trained, at least in part, using a plurality of images that were generated using one or more tools that collectively convert a set of PDF given document files to the plurality of images, wherein at least one given document file has a ground truth label of being a phishing PDF; and provide as output a verdict for the PDF document based at least in part on the determined likelihood, wherein the verdict is usable by a security appliance to take a remedial action associated with the received PDF document; and a memory coupled to the processor and configured to provide the processor with instructions. 2. The system of claim 1 , wherein the processor is further configured to determine whether the PDF document includes at least one the clickable link. 3. The system of claim 1 , wherein the verdict is that the received PDF document is benign. 4. The system of claim 1 , wherein the verdict is that the received PDF document does not represent a phishing threat. 5. The system of claim 1 , wherein determining the likelihood includes converting at least one page of the received PDF document into an image. 6. The system of claim 1 , wherein at least some of the images labeled as phishing PDFs belong, collectively, to a multi-page PDF document. 7. The system of claim 1 , wherein, prior to training the image-based model, an image hash-based filtering operation is performed on at least some of the images labeled as phishing PDFs. 8. The system of claim 7 , wherein filtered images are stored using a TFRecord data format. 9. The system of claim 1 , wherein the processor is further configured to generate the image-based model. 10. The system of claim 1 , wherein the image-based model is a convolutional neural network model. 11. The system of claim 1 , wherein, at least in part in response to receiving an indication of a false positive result, the image-based model is retrained using a benign data set that includes the false positive result. 12. A method, comprising: receiving a Portable Document Format (PDF) document in response to a determination having been made that the PDF document includes at least one clickable link to a Uniform Resource Locator (URL); determining a likelihood that the received PDF document represents a phishing threat, at least in part, using an image-based model that was previously trained, at least in part, using a plurality of images that were generated using one or more tools that collectively convert a set of PDF given document files to the plurality of images, wherein at least one given document file has a ground truth label of being a phishing PDF; and providing as output a verdict for the PDF document based at least in part on the determined likelihood, wherein the verdict is usable by a security appliance to take a remedial action associated with the received PDF document. 13. The method of claim 12 , further comprising determining whether the PDF document includes the at least one clickable link. 14. The method of claim 12 , wherein the verdict is that the received PDF document is benign. 15. The method of claim 12 , wherein the verdict is that the received PDF document does not represent a phishing threat. 16. The method of claim 12 , wherein determining the likelihood includes converting at least one page of the received PDF document into an image. 17. The method of claim 12 , wherein at least some of the images labeled as phishing PDFs belong, collectively, to a multi-page PDF document. 18. The method of claim 12 , wherein, prior to training the image-based model, an image hash-based filtering operation is performed on at least some of the images labeled as phishing PDFs. 19. The method of claim 18 , wherein filtered images are stored using a TFRecord data format. 20. The method of claim 12 , further comprising generating the image-based model. 21. The method of claim 12 , wherein the image-based model is a convolutional neural network model. 22. The method of claim 12 , wherein, at least in part in response to receiving an indication of a false positive result, the image-based model is retrained using a benign data set that includes the false positive result. 23. A computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for: receiving a Portable Document Format (PDF) document in response to a determination having been made that the PDF document includes at least one clickable link to a Uniform Resource Locator (URL); determining a likelihood that the received PDF document represents a phishing threat, at least in part, using an image-based model that was previously trained, at least in part, using a plurality of images that were generated using one or more tools that collectively convert a set of PDF given document files to the plurality of images, wherein at least one given document file has a ground truth label of being a phishing PDF; and providing as output a verdict for the PDF document based at least in part on the determined likelihood, wherein the verdict is usable by a security appliance to take a remedial action associated with the received PDF document.
Assessing vulnerabilities and evaluating computer system security · CPC title
service impersonation, e.g. phishing, pharming or web spoofing (detection of rogue wireless access points H04W12/12) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.