Data loss prevention framework using cloud infrastructure
US-2024176905-A1 · May 30, 2024 · US
US2018329958A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2018329958-A1 |
| Application number | US-201715594376-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 12, 2017 |
| Priority date | May 12, 2017 |
| Publication date | Nov 15, 2018 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A query graph, which includes vertices and edges, represents a query on graph-structured data. The query graph is decomposed into query subgraphs. A network analysis tool performs continuous subgraph matching queries to facilitate analysis of computer network traffic, social media events, or other streams of data represented as a dynamic data graph (graph-structured data). This can help identify emerging trends in the data. Some features of the network analysis tool enhance performance by effectively utilizing distributed computing resources (including processing cores and memory at different nodes of a cluster) to speed up the process of updating the dynamic data graph and detecting matches of query subgraphs. Features of a query graph building tool enhance usability by providing intuitive ways to specify query graphs and their subgraphs. Features of a results visualization tool enhance usability by providing an intuitive way to present the results of continuous subgraph matching queries.
Opening claim text (preview).
We claim: 1 . A computer system comprising multiple nodes, each of the multiple nodes including multiple processing cores and memory addressed according to a global address space, wherein the multiple nodes, collectively, are configured to perform operations comprising: receiving, as part of one or more streams of updates, information that indicates additions to a data graph, the data graph including vertices and edges between the vertices; performing a continuous query process to identify complete matches, if any, of a query graph within the data graph, the continuous query process using multithreading with tasks executable in parallel on at least some of the multiple processing cores of the multiple nodes, wherein the continuous query process includes: updating the data graph based on the additions to the data graph, thereby adding new edges to the edges of the data graph; searching for new sets of partial matches of query subgraphs of the query graph at the new edges of the data graph; and conditionally performing, based at least in part on evaluation of a condition, join operations between partial matches from the new sets of partial matches and partial matches from cumulative sets of partial matches, wherein the condition is whether affected vertices of the data graph for the partial matches from the cumulative sets of partial matches have been updated within a time window defined by a join threshold; and outputting results of the continuous query process. 2 . The computer system of claim 1 , wherein the continuous query process further includes pruning the data graph to remove one or more of the edges of the data graph and/or one or more of the vertices of the data graph. 3 . The computer system of claim 2 , wherein the continuous query process further includes: determining which of the edges of the data graph are outside a time window defined by a pruning threshold, wherein the pruning removes any of the edges of the data graph that are outside the time window defined by the pruning threshold; and determining which of the vertices of the data graph are attached to none of the edges of the data graph, wherein the pruning removes any of the vertices of the data graph that are attached to none of the edges of the data graph. 4 . The computer system of claim 1 , wherein, for the updating the data graph, different instances of the tasks are executable in parallel for different batches of the additions to the data graph. 5 . The computer system of claim 1 , wherein the updating the data graph includes, at one of the at least some of the multiple processing cores: for a given new edge, among the new edges of the data graph, to be added to a given vertex of the vertices of the data graph, using an atomic transaction to increment a counter that indicates a size of an adjacency list for the given vertex; checking whether the counter has reached a size threshold; obtaining a lock to gain exclusive access to the adjacency list for the given vertex; if the counter has reached the size threshold, increasing size of the adjacency list for the given vertex; adding the given new edge to the adjacency list for the given vertex; and releasing the lock. 6 . The computer system of claim 1 , wherein, for the searching for the new sets of partial matches, different instances of the tasks are executable in parallel for different edges of the new edges of the data graph. 7 . The computer system of claim 1 , wherein the searching for the new sets of partial matches includes searching, at the new edges of the data graph, for partial matches of each unique query subgraph among the query subgraphs of the query graph. 8 . The computer system of claim 1 , wherein, as part of a lazy searching process: the searching for the new sets of partial matches includes searching, at the new edges of the data graph, for partial matches of only a top-selectivity query subgraph and any other query subgraph, among the query subgraphs of the query graph, that furthers progress of a previous partial match towards completion; and if a partial match is found, the continuous query process further includes searching, at neighboring edges of the data graph, for any of the query subgraphs that furthers progress of the partial match of the top-selectivity query subgraph towards completion. 9 . The computer system of claim 1 , wherein the conditionally performing the join operations includes, for a given partial match from one of the cumulative sets of partial matches: for the evaluation of the condition, determining whether affected vertices for the given partial match have been updated within the time window defined by the join threshold; if so, performing one of the join operations between the given partial match and a partial match from one of the new sets of partial matches; and otherwise, skipping the performing one of the join operations between the given partial match and the partial match from one of the new sets of partial matches. 10 . The computer system of claim 9 , wherein the affected vertices for the given partial match are all vertices of the given partial match or a subset of vertices of the given partial match. 11 . The computer system of claim 1 , wherein, for at least one of the query subgraphs, an associated set among the new sets of partial matches is empty. 12 . The computer system of claim 1 , wherein a subgraph join tree includes one or more internal vertices and multiple leaf vertices, each of the multiple leaf vertices having an old set of partial matches for one of the query subgraphs, and wherein the continuous query process further includes: for each given leaf vertex of the multiple leaf vertices of the subgraph join tree, adding an associated set among the new sets of partial matches to the old set of partial matches for the leaf vertex to produce an associated set among the cumulative sets of partial matches; wherein, for a given internal vertex of the one or more internal vertices of the subgraph join tree, join operations are conditionally performed between partial matches from one of the new sets of partial matches for a first child vertex and partial matches from one of the cumulative sets of partial matches for a second child vertex. 13 . The computer system of claim 1 , wherein the conditionally performing the join operations includes, for a given one of the new sets of partial matches and a corresponding one of the cumulative sets of partial matches: mapping at least some of the given new set of partial matches and at least some of the corresponding cumulative set of partial matches to a sequence of key-value pairs; aggregating the key-value pairs to produce groups of the key-value pairs organized by key; and reducing the respective groups of key-value pairs. 14 . The computer system of 13 , wherein: for the mapping, different instances of the tasks are executable in parallel for different partial matches of the given new set of partial matches and the corresponding cumulative set of partial matches; and/or for the reducing, different instances of the tasks are executable in parallel for different groups among the groups of the key-value pairs. 15 . The computer system of claim 13 , wherein the condition is evaluated before the mapping, during the mapping, or during the reducing. 16 . The computer system of claim 1 , wherein the information that indicates the additions to the data graph is received, as part of the one or more streams of updates, from: one or more network traffic monitors, wherein the query graph represent a target pattern of
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Updates performed during online database operations; commit processing · CPC title
Data stream processing; Continuous queries · CPC title
Join operations · CPC title
involving long-term monitoring or reporting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.