Determining malware infection risk

US10164995B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10164995-B1
Application numberUS-201514826867-A
CountryUS
Kind codeB1
Filing dateAug 14, 2015
Priority dateAug 14, 2014
Publication dateDec 25, 2018
Grant dateDec 25, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing semi-supervised learning on partially labeled nodes on a bipartite graph. One described method can determine a useful score of malware infection risk from partial known facts for entities modeled as nodes on a bipartite graph, where network traffic is measured between inside-the-enterprise entities and outside-the-enterprise entities. This and other methods can be implemented in a large-scale massively parallel processing database. Methods of scaling the partial label input and of presenting the results are also described.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving data representing aggregate network traffic data as a bipartite graph between nodes representing first entities and nodes representing second entities, wherein each edge of the bipartite graph connects a node representing a respective first entity with a node representing a respective second entity, each edge of the bipartite graph has an edge weight representing a measure of the aggregate network traffic between the entities represented by nodes of the graph connected by the edge, the aggregate network traffic data represents network traffic between the first entities and the second entities, and each of the entities is an entity communicating with one or more other entities on a data communication network; receiving an initial collection of ground truth label values for some of the first entities, some of the second entities, or both, wherein each ground truth label value for an entity indicates that the entity is known to be safe or unsafe, and wherein each ground truth label value is either −r or +r, wherein r is a positive real number; computing a respective initial score for each of the first entities and each of the second entities, each initial score being a non-zero value for a respective entity that has a known ground truth label value indicating that the entity is known to be safe or unsafe or a zero value for entities that do not have a known ground truth label value, each entity with a ground truth value of −r being assigned an initial score of −r/B, and each entity with a ground truth value of +r being assigned an initial score of +r/A, wherein B is a count of how many values of −r were present in the initial collection, and wherein A is a count of how many values of +r were present in the initial collection; iteratively computing a respective final score for each of the first entities and the second entities from the initial scores and the edge weights, the final score indicating malware infection risk of a corresponding entity; identifying, based on the final scores, one or more first entities, one or more second entities, or both, that are likely infected with malware; and reporting the identified one or more first entities or one or more second entities to a user. 2. The system of claim 1 , wherein iteratively computing a respective final score for each entity comprises: for each entity, computing at each iteration a new score for the entity from previously-determined scores for other entities having a connection to the entity in the bipartite graph. 3. The system of claim 1 , wherein iteratively computing a respective final score for each entity comprises: for each entity, computing at each iteration a new score for the entity as a propagation score for the entity from previously-determined scores for other entities having a connection to the entity in the bipartite graph plus the initial score for the entity. 4. The system of claim 3 , the operations further comprising: for each entity, computing the propagation score as a weighted sum of the previously-determined scores, each weight in the weighted sum being a corresponding edge weight from the bipartite graph. 5. The system of claim 1 , the operations further comprising: obtaining network traffic data from transaction logs; and aggregating the network traffic data to generate the aggregate network traffic data. 6. The system of claim 1 , wherein: the first entities are entities within a perimeter of perimeter entities of the data communication network; and the second entities are entities outside the perimeter of perimeter entities of the data communication network. 7. The system of claim 1 , wherein reporting the identified one or more first entities or one or more second entities comprises: displaying multiple final scores in a sorted order, including displaying each of the multiple final score with an identifier of the first entity or the second entity of the final score. 8. The system of claim 7 , the operations further comprising: displaying each of the multiple final scores with an indication of whether the first entity or the second entity of the final score had a known ground truth label value. 9. The system of claim 1 , wherein iteratively computing a respective final score for each entity comprises: iteratively computing new score values x t+1 and y t+1 according to: x t + 1 ⁡ ( i ) = alpha × ( ∑ j ∈ N ⁡ ( i ) ⁢ ( W ⁡ ( i , j ) ∑ k ∈ N ⁡ ( j ) ⁢ W ⁡ ( k , j ) × y t ⁡ ( j )

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Assessing vulnerabilities and evaluating computer system security · CPC title

  • Vulnerability analysis · CPC title

  • Computer malware detection or handling, e.g. anti-virus arrangements · CPC title

  • Detecting local intrusion or implementing counter-measures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10164995B1 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing semi-supervised learning on partially labeled nodes on a bipartite graph. One described method can determine a useful score of malware infection risk from partial known facts for entities modeled as nodes on a bipartite graph, where network traffic is measured between inside-the-enterp…
Who is the assignee on this patent?
Pivotal Software Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1433. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Dec 25 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).