Learning indicators of compromise with hierarchical models
US-2018063163-A1 · Mar 1, 2018 · US
US11880462B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11880462-B2 |
| Application number | US-201817057639-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 21, 2018 |
| Priority date | May 21, 2018 |
| Publication date | Jan 23, 2024 |
| Grant date | Jan 23, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method ( 600 ) for identifying malicious software includes receiving and executing a software application ( 210 ), identifying a plurality of uniform resource identifiers ( 220 ) the software application interacts with during execution of the software application, and generating a vector representation ( 260 ) for the software application using a feed-forward neural network ( 170 ) configured to receive the plurality of uniform resource identifiers as feature inputs. The method also includes determining similarity scores ( 262 ) for a pool of training applications, each similarity score associated with a corresponding training application and indicating a level of similarity between the vector representation for the software application and a respective vector representation for the corresponding training application. The method also includes flagging the software application as belonging to a potentially harmful application category ( 240 b ) when one or more of the training applications have similarity scores that satisfy a similarity threshold and include a potentially harmful application label.
Opening claim text (preview).
What is claimed is: 1. A method for identifying malicious software, the method comprising: receiving, at data processing hardware, a software application; executing, by the data processing hardware, the software application; identifying, by the data processing hardware, a plurality of uniform resource identifiers the software application interacts with during execution of the software application; generating, by the data processing hardware, a vector representation for the software application using a feed-forward neural network configured to receive the plurality of uniform resource identifiers as feature inputs; determining, by the data processing hardware, similarity scores for a pool of training applications stored in memory hardware in communication with the data processing hardware, each similarity score associated with a corresponding training application and indicating a level of similarity between the vector representation for the software application and a respective vector representation for the corresponding training application; and flagging, by the data processing hardware, the software application as belonging to a potentially harmful application category when one or more of the training applications have similarity scores that satisfy a similarity threshold and comprise a potentially harmful application label. 2. The method of claim 1 , wherein identifying the plurality of uniform resource identifiers comprises identifying a plurality of domain names the software application visits during the execution of the software application. 3. The method of claim 1 , wherein the feed-forward neural network comprises a vector space model configured to: determine an n-dimensional numerical vector representation for each of the identified uniform resource identifiers; and calculate the vector representation for the software application by averaging the n-dimensional numerical vector representations for the identified uniform resource identifiers. 4. The method of claim 1 , wherein determining the similarity scores for the pool of training applications comprises calculating a respective cosine similarity between the vector representation for the software application and the respective vector representation for each corresponding training application. 5. The method of claim 1 , wherein the vector representation for the software application comprises an n-dimensional vector of numerical values. 6. The method of claim 1 , further comprising retrieving, by the data processing hardware, the training applications associated with the top-k highest similarity scores in the pool of training applications from the memory hardware. 7. The method of claim 1 , further comprising: identifying, by the data processing hardware, a potentially harmful application category associated with a majority of the training applications in the pool of training applications each having the corresponding similarity score that satisfies the similarity threshold and comprising the potentially harmful application label; and assigning, by the data processing hardware, the software application to the identified potentially harmful application category. 8. The method of claim 7 , wherein the potentially harmful application category assigned to the software application comprises one of a hostile downloader application, a phishing application, a rooting Trojan application, a spyware application, a ransomware application, a malware application, or an escalating privileges application. 9. The method of claim 1 , further comprising, after flagging the software application as belonging to the potentially harmful application category: receiving, at the data processing hardware, a download request to download the software application from a user device in communication with the data processing hardware; and in response to receiving the download request, transmitting a warning notification to the user device, the warning notification indicating that the software application is flagged as belonging to the potentially harmful application category. 10. A system for identifying malware, the system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed by the data processing hardware cause the data processing hardware to perform operations comprising: receiving a software application; executing the software application; identifying a plurality of uniform resource identifiers the software application interacts with during execution of the software application; generating a vector representation for the software application using a feed-forward neural network configured to receive the plurality of uniform resource identifiers as feature inputs; determining similarity scores for a pool of training applications stored in the memory hardware, each similarity score associated with a corresponding training application and indicating a level of similarity between the vector representation for the software application and a respective vector representation for the corresponding training application; and flagging the software application as belonging to a potentially harmful application category when one or more of the training applications have similarity scores that satisfy a similarity threshold and comprise a potentially harmful application label. 11. The system of claim 10 , wherein identifying the plurality of resource identifiers comprises identifying a plurality of domain names the software application visits during the execution of the software application. 12. The system of claim 10 , wherein the feed-forward neural network comprises a vector space model configured to: determine an n-dimensional numerical vector representation for each of the identified uniform resource identifiers; and calculate the vector representation for the software application by averaging the n-dimensional numerical vector representations for the identified uniform resource identifiers. 13. The system of claim 10 , wherein determining the similarity scores for the pool of training applications comprises calculating a respective cosine similarity between the vector representation for the software application and the respective vector representation for each corresponding training application. 14. The system of claim 10 , wherein the vector representation for the software application comprises an n-dimensional vector of numerical values. 15. The system of claim 10 , wherein the operations further comprise retrieving the training applications associated with the top-k highest similarity scores in the pool of training applications from the memory hardware. 16. The system of claim 10 , wherein the operations further comprise: identifying a potentially harmful application category associated with a majority of the training applications in the pool of training applications each having the corresponding similarity score that satisfies the similarity threshold and comprising the potentially harmful application label; and assigning the software application to the identified potentially harmful application category. 17. The system of claim 16 , wherein the potentially harmful application category assigned to the software application comprises one of a hostile downloader, a phishing application, a rooting Trojan application, a spyware application, a ransomware application, a malware application, or an escalating privileges application. 18. The system of claim 10 , wherein the operations further comprise
Supervised learning · CPC title
Feedforward networks · CPC title
Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title
during program execution, e.g. stack integrity {; Preventing unwanted data erasure; Buffer overflow} · CPC title
Architecture, e.g. interconnection topology · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.