Identify malicious software

US11880462B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11880462-B2
Application numberUS-201817057639-A
CountryUS
Kind codeB2
Filing dateMay 21, 2018
Priority dateMay 21, 2018
Publication dateJan 23, 2024
Grant dateJan 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method ( 600 ) for identifying malicious software includes receiving and executing a software application ( 210 ), identifying a plurality of uniform resource identifiers ( 220 ) the software application interacts with during execution of the software application, and generating a vector representation ( 260 ) for the software application using a feed-forward neural network ( 170 ) configured to receive the plurality of uniform resource identifiers as feature inputs. The method also includes determining similarity scores ( 262 ) for a pool of training applications, each similarity score associated with a corresponding training application and indicating a level of similarity between the vector representation for the software application and a respective vector representation for the corresponding training application. The method also includes flagging the software application as belonging to a potentially harmful application category ( 240 b ) when one or more of the training applications have similarity scores that satisfy a similarity threshold and include a potentially harmful application label.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for identifying malicious software, the method comprising: receiving, at data processing hardware, a software application; executing, by the data processing hardware, the software application; identifying, by the data processing hardware, a plurality of uniform resource identifiers the software application interacts with during execution of the software application; generating, by the data processing hardware, a vector representation for the software application using a feed-forward neural network configured to receive the plurality of uniform resource identifiers as feature inputs; determining, by the data processing hardware, similarity scores for a pool of training applications stored in memory hardware in communication with the data processing hardware, each similarity score associated with a corresponding training application and indicating a level of similarity between the vector representation for the software application and a respective vector representation for the corresponding training application; and flagging, by the data processing hardware, the software application as belonging to a potentially harmful application category when one or more of the training applications have similarity scores that satisfy a similarity threshold and comprise a potentially harmful application label. 2. The method of claim 1 , wherein identifying the plurality of uniform resource identifiers comprises identifying a plurality of domain names the software application visits during the execution of the software application. 3. The method of claim 1 , wherein the feed-forward neural network comprises a vector space model configured to: determine an n-dimensional numerical vector representation for each of the identified uniform resource identifiers; and calculate the vector representation for the software application by averaging the n-dimensional numerical vector representations for the identified uniform resource identifiers. 4. The method of claim 1 , wherein determining the similarity scores for the pool of training applications comprises calculating a respective cosine similarity between the vector representation for the software application and the respective vector representation for each corresponding training application. 5. The method of claim 1 , wherein the vector representation for the software application comprises an n-dimensional vector of numerical values. 6. The method of claim 1 , further comprising retrieving, by the data processing hardware, the training applications associated with the top-k highest similarity scores in the pool of training applications from the memory hardware. 7. The method of claim 1 , further comprising: identifying, by the data processing hardware, a potentially harmful application category associated with a majority of the training applications in the pool of training applications each having the corresponding similarity score that satisfies the similarity threshold and comprising the potentially harmful application label; and assigning, by the data processing hardware, the software application to the identified potentially harmful application category. 8. The method of claim 7 , wherein the potentially harmful application category assigned to the software application comprises one of a hostile downloader application, a phishing application, a rooting Trojan application, a spyware application, a ransomware application, a malware application, or an escalating privileges application. 9. The method of claim 1 , further comprising, after flagging the software application as belonging to the potentially harmful application category: receiving, at the data processing hardware, a download request to download the software application from a user device in communication with the data processing hardware; and in response to receiving the download request, transmitting a warning notification to the user device, the warning notification indicating that the software application is flagged as belonging to the potentially harmful application category. 10. A system for identifying malware, the system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed by the data processing hardware cause the data processing hardware to perform operations comprising: receiving a software application; executing the software application; identifying a plurality of uniform resource identifiers the software application interacts with during execution of the software application; generating a vector representation for the software application using a feed-forward neural network configured to receive the plurality of uniform resource identifiers as feature inputs; determining similarity scores for a pool of training applications stored in the memory hardware, each similarity score associated with a corresponding training application and indicating a level of similarity between the vector representation for the software application and a respective vector representation for the corresponding training application; and flagging the software application as belonging to a potentially harmful application category when one or more of the training applications have similarity scores that satisfy a similarity threshold and comprise a potentially harmful application label. 11. The system of claim 10 , wherein identifying the plurality of resource identifiers comprises identifying a plurality of domain names the software application visits during the execution of the software application. 12. The system of claim 10 , wherein the feed-forward neural network comprises a vector space model configured to: determine an n-dimensional numerical vector representation for each of the identified uniform resource identifiers; and calculate the vector representation for the software application by averaging the n-dimensional numerical vector representations for the identified uniform resource identifiers. 13. The system of claim 10 , wherein determining the similarity scores for the pool of training applications comprises calculating a respective cosine similarity between the vector representation for the software application and the respective vector representation for each corresponding training application. 14. The system of claim 10 , wherein the vector representation for the software application comprises an n-dimensional vector of numerical values. 15. The system of claim 10 , wherein the operations further comprise retrieving the training applications associated with the top-k highest similarity scores in the pool of training applications from the memory hardware. 16. The system of claim 10 , wherein the operations further comprise: identifying a potentially harmful application category associated with a majority of the training applications in the pool of training applications each having the corresponding similarity score that satisfies the similarity threshold and comprising the potentially harmful application label; and assigning the software application to the identified potentially harmful application category. 17. The system of claim 16 , wherein the potentially harmful application category assigned to the software application comprises one of a hostile downloader, a phishing application, a rooting Trojan application, a spyware application, a ransomware application, a malware application, or an escalating privileges application. 18. The system of claim 10 , wherein the operations further comprise

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Feedforward networks · CPC title

  • G06F21/566Primary

    Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title

  • during program execution, e.g. stack integrity {; Preventing unwanted data erasure; Buffer overflow} · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11880462B2 cover?
A method ( 600 ) for identifying malicious software includes receiving and executing a software application ( 210 ), identifying a plurality of uniform resource identifiers ( 220 ) the software application interacts with during execution of the software application, and generating a vector representation ( 260 ) for the software application using a feed-forward neural network ( 170 ) configured…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F21/566. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).