Distributed graph processing system that support remote data read with proactive bulk data transfer
US-2016292303-A1 · Oct 6, 2016 · US
US10055509B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10055509-B2 |
| Application number | US-201514680150-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 7, 2015 |
| Priority date | Jul 15, 2014 |
| Publication date | Aug 21, 2018 |
| Grant date | Aug 21, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for efficiently loading graph data into memory are provided. A plurality of node ID lists are retrieved from storage. Each node ID list is ordered based on one or more order criteria, such as node ID, and is read into memory. A new list of node IDs is created in memory and is initially empty. From among the plurality of node ID lists, a particular node ID is selected based on the one or more order criteria, removed from the node ID list where the particular node ID originates, and added to the new list. This process of selecting, removing, and adding continues until no more than one node ID list exists, other than the new list. In this way, the retrieval of the plurality of node ID lists from storage may be performed in parallel while the selecting and adding are performed sequentially.
Opening claim text (preview).
What is claimed is: 1. A method for generating an in-memory representation of a graph that is represented in graph data that is in a graph data format and that is stored in persistent storage, the method comprising: identifying, in memory, a plurality of node identifier lists of nodes indicated in the graph data, wherein each node identifier list in the plurality of node identifier lists is ordered based on one or more order criteria and identifies multiple nodes; for each of multiple node identifiers in the plurality of node identifier lists: based on the one or more order criteria, determining a particular node identifier, from among the plurality of node identifier lists, to remove; removing the particular node identifier from one of the plurality of node identifier lists; determining a particular node index value for the particular node identifier; based on determining the particular node index value, updating a mapping to include an association between the particular node index value and the particular node identifier; wherein the mapping maps, for each node identifier indicated in the plurality of node identifier lists, said each node identifier to a node index value; wherein the method is performed by one or more computing devices. 2. The method of claim 1 , further comprising, prior to determining the particular node identifier: for each node partition of a plurality of node partitions that are stored in the persistent storage: storing an index for said each node partition; using the index for said each node partition to generate a different node identifier list of the plurality of node identifier lists; sending the different node identifier list to a graph analytic engine that creates the mapping. 3. The method of claim 2 , further comprising: for each edge partition of a plurality of edge partitions that are stored in the persistent storage: storing a second index for said each edge partition; using the second index for said each edge partition to generate a different ordered edge list of a plurality of ordered edge lists; sending the different ordered edge list to the graph analytic engine. 4. The method of claim 3 , wherein: each edge partition of the plurality of edge partitions stores, for each edge indicated in said each edge partition, a source identifier that identifies a first node and a destination identifier that identifies a second node; each ordered edge list in the plurality of ordered edge lists is ordered, first, based on source identifier and, second, based on destination identifier. 5. The method of claim 1 , further comprising: creating a first intermediate list that comprises a plurality of entries; wherein each entry in the first intermediate list includes: (1) node identifier data that identifies a node; (2) partition data that indicates a node identifier list from which the node identifier originates; and (3) index data that indicates a position within the node identifier list. 6. The method of claim 5 , further comprising: storing, in the persistent storage, in association with each node identifier indicated in each node identifier list of the plurality of node identifier lists, one or more properties of a node identified by said each node identifier; using the first intermediate list to determine where to store, in memory, the one or more properties of each node identified in each node identifier list. 7. The method of claim 6 , wherein using the first intermediate list comprises: assigning a different set of node properties to a different thread of a plurality of threads that execute concurrently; wherein each thread of the plurality of threads uses, concurrently with respect to each other thread of the plurality of threads, the first intermediate list to determine where to store the set of node properties that are assigned to said each thread. 8. The method of claim 1 , further comprising: identifying a plurality of edge lists of edges indicated in the graph data, wherein each edge list in the plurality of edge lists is ordered based on the one or more order criteria and indicates multiple edges; using the mapping to replace a node identifier of each edge indicated in the plurality of edge lists with a corresponding node index value. 9. The method of claim 8 , further comprising: creating a neighbor list of node identifiers that is initially empty; for each of multiple edges indicated in the plurality of edge lists: based on the one or more order criteria, selecting a particular edge from among the plurality of edge lists; removing the particular edge from one of the plurality of edge lists; adding the particular edge to the neighbor list of node identifiers. 10. The method of claim 9 , wherein adding the particular edge to the neighbor list comprises: creating a new edge entry in the neighbor list, wherein the new edge entry includes a first node index value that is associated with a destination node of the particular edge; determining whether a source node index value associated with the particular edge is new relative to all other source node index values currently indicated in a node list of the in-memory representation; if the source node index value is new, then creating, in the node list, a new node entry that corresponds to a node in the graph and causing the new node entry to point to the new edge entry. 11. The method of claim 9 , further comprising: creating an intermediate list that is associated with the neighbor list of node identifiers; wherein each position in the intermediate list corresponds to a position in the neighbor list and includes (a) edge list data that indicates an edge list from which an edge, corresponding to the position in the neighbor list, originates and (b) index data that indicates a position within the edge list where one or more edge properties of the edge are located. 12. The method of claim 11 , further comprising: storing, in the persistent storage, in association with each edge indicated in each edge list of the plurality of edge lists, one or more properties of said each edge; assigning a different set of edge properties to a different thread of a plurality of threads that execute concurrently; wherein each thread of the plurality of threads uses the intermediate list to determine where to store the set of edge properties that are assigned to said each thread. 13. A method for generating an in-memory representation of a graph, the method comprising: loading, into memory from persistent storage, node data that identifies a plurality of nodes of the graph and is ordered based on one or more criteria; for each node identifier indicated in the node data: determining a node index value in a node array of the in-memory representation; storing, in a mapping, an association between the node index value and said each node identifier; loading, into the memory from the persistent storage, edge data that identifies a plurality of edges of the graph and is ordered based on the one or more criteria; for each edge indicated in the edge data: using the mapping to identify (1) a source node identifier of said each edge with a first node index value and (2) a destination node identifier of said each edge with a second node index value; identifying an entry in the node array based on the first node index value; identifying an entry in a neighbor array of the in-memory representation based on the second node index value; storing information about said each edge in the entry in the neighbor array; wherein the method is performed by one or more computing devices. 14. One or more storage media sto
Database-specific techniques · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Trees, e.g. B+trees · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.