Performance and usability enhancements for continuous subgraph matching queries on graph-structured data

US2018329958A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018329958-A1
Application numberUS-201715594376-A
CountryUS
Kind codeA1
Filing dateMay 12, 2017
Priority dateMay 12, 2017
Publication dateNov 15, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A query graph, which includes vertices and edges, represents a query on graph-structured data. The query graph is decomposed into query subgraphs. A network analysis tool performs continuous subgraph matching queries to facilitate analysis of computer network traffic, social media events, or other streams of data represented as a dynamic data graph (graph-structured data). This can help identify emerging trends in the data. Some features of the network analysis tool enhance performance by effectively utilizing distributed computing resources (including processing cores and memory at different nodes of a cluster) to speed up the process of updating the dynamic data graph and detecting matches of query subgraphs. Features of a query graph building tool enhance usability by providing intuitive ways to specify query graphs and their subgraphs. Features of a results visualization tool enhance usability by providing an intuitive way to present the results of continuous subgraph matching queries.

First claim

Opening claim text (preview).

We claim: 1 . A computer system comprising multiple nodes, each of the multiple nodes including multiple processing cores and memory addressed according to a global address space, wherein the multiple nodes, collectively, are configured to perform operations comprising: receiving, as part of one or more streams of updates, information that indicates additions to a data graph, the data graph including vertices and edges between the vertices; performing a continuous query process to identify complete matches, if any, of a query graph within the data graph, the continuous query process using multithreading with tasks executable in parallel on at least some of the multiple processing cores of the multiple nodes, wherein the continuous query process includes: updating the data graph based on the additions to the data graph, thereby adding new edges to the edges of the data graph; searching for new sets of partial matches of query subgraphs of the query graph at the new edges of the data graph; and conditionally performing, based at least in part on evaluation of a condition, join operations between partial matches from the new sets of partial matches and partial matches from cumulative sets of partial matches, wherein the condition is whether affected vertices of the data graph for the partial matches from the cumulative sets of partial matches have been updated within a time window defined by a join threshold; and outputting results of the continuous query process. 2 . The computer system of claim 1 , wherein the continuous query process further includes pruning the data graph to remove one or more of the edges of the data graph and/or one or more of the vertices of the data graph. 3 . The computer system of claim 2 , wherein the continuous query process further includes: determining which of the edges of the data graph are outside a time window defined by a pruning threshold, wherein the pruning removes any of the edges of the data graph that are outside the time window defined by the pruning threshold; and determining which of the vertices of the data graph are attached to none of the edges of the data graph, wherein the pruning removes any of the vertices of the data graph that are attached to none of the edges of the data graph. 4 . The computer system of claim 1 , wherein, for the updating the data graph, different instances of the tasks are executable in parallel for different batches of the additions to the data graph. 5 . The computer system of claim 1 , wherein the updating the data graph includes, at one of the at least some of the multiple processing cores: for a given new edge, among the new edges of the data graph, to be added to a given vertex of the vertices of the data graph, using an atomic transaction to increment a counter that indicates a size of an adjacency list for the given vertex; checking whether the counter has reached a size threshold; obtaining a lock to gain exclusive access to the adjacency list for the given vertex; if the counter has reached the size threshold, increasing size of the adjacency list for the given vertex; adding the given new edge to the adjacency list for the given vertex; and releasing the lock. 6 . The computer system of claim 1 , wherein, for the searching for the new sets of partial matches, different instances of the tasks are executable in parallel for different edges of the new edges of the data graph. 7 . The computer system of claim 1 , wherein the searching for the new sets of partial matches includes searching, at the new edges of the data graph, for partial matches of each unique query subgraph among the query subgraphs of the query graph. 8 . The computer system of claim 1 , wherein, as part of a lazy searching process: the searching for the new sets of partial matches includes searching, at the new edges of the data graph, for partial matches of only a top-selectivity query subgraph and any other query subgraph, among the query subgraphs of the query graph, that furthers progress of a previous partial match towards completion; and if a partial match is found, the continuous query process further includes searching, at neighboring edges of the data graph, for any of the query subgraphs that furthers progress of the partial match of the top-selectivity query subgraph towards completion. 9 . The computer system of claim 1 , wherein the conditionally performing the join operations includes, for a given partial match from one of the cumulative sets of partial matches: for the evaluation of the condition, determining whether affected vertices for the given partial match have been updated within the time window defined by the join threshold; if so, performing one of the join operations between the given partial match and a partial match from one of the new sets of partial matches; and otherwise, skipping the performing one of the join operations between the given partial match and the partial match from one of the new sets of partial matches. 10 . The computer system of claim 9 , wherein the affected vertices for the given partial match are all vertices of the given partial match or a subset of vertices of the given partial match. 11 . The computer system of claim 1 , wherein, for at least one of the query subgraphs, an associated set among the new sets of partial matches is empty. 12 . The computer system of claim 1 , wherein a subgraph join tree includes one or more internal vertices and multiple leaf vertices, each of the multiple leaf vertices having an old set of partial matches for one of the query subgraphs, and wherein the continuous query process further includes: for each given leaf vertex of the multiple leaf vertices of the subgraph join tree, adding an associated set among the new sets of partial matches to the old set of partial matches for the leaf vertex to produce an associated set among the cumulative sets of partial matches; wherein, for a given internal vertex of the one or more internal vertices of the subgraph join tree, join operations are conditionally performed between partial matches from one of the new sets of partial matches for a first child vertex and partial matches from one of the cumulative sets of partial matches for a second child vertex. 13 . The computer system of claim 1 , wherein the conditionally performing the join operations includes, for a given one of the new sets of partial matches and a corresponding one of the cumulative sets of partial matches: mapping at least some of the given new set of partial matches and at least some of the corresponding cumulative set of partial matches to a sequence of key-value pairs; aggregating the key-value pairs to produce groups of the key-value pairs organized by key; and reducing the respective groups of key-value pairs. 14 . The computer system of 13 , wherein: for the mapping, different instances of the tasks are executable in parallel for different partial matches of the given new set of partial matches and the corresponding cumulative set of partial matches; and/or for the reducing, different instances of the tasks are executable in parallel for different groups among the groups of the key-value pairs. 15 . The computer system of claim 13 , wherein the condition is evaluated before the mapping, during the mapping, or during the reducing. 16 . The computer system of claim 1 , wherein the information that indicates the additions to the data graph is received, as part of the one or more streams of updates, from: one or more network traffic monitors, wherein the query graph represent a target pattern of

Assignees

Inventors

Classifications

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Updates performed during online database operations; commit processing · CPC title

  • Data stream processing; Continuous queries · CPC title

  • Join operations · CPC title

  • involving long-term monitoring or reporting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018329958A1 cover?
A query graph, which includes vertices and edges, represents a query on graph-structured data. The query graph is decomposed into query subgraphs. A network analysis tool performs continuous subgraph matching queries to facilitate analysis of computer network traffic, social media events, or other streams of data represented as a dynamic data graph (graph-structured data). This can help identif…
Who is the assignee on this patent?
Battelle Memorial Institute
What technology area does this patent fall under?
Primary CPC classification G06F16/24568. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 15 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).