Who is the assignee on this patent?

Leida Marcello, Afzal Ali, Taylor Paul, and 4 more

What technology area does this patent fall under?

Primary CPC classification G06F16/2452. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 02 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and system for processing data queries

US9639575B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9639575-B2
Application number	US-201213435747-A
Country	US
Kind code	B2
Filing date	Mar 30, 2012
Priority date	Mar 30, 2012
Publication date	May 2, 2017
Grant date	May 2, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The invention relates to a method and system that provide a high performance and extremely scalable triple store within the Resource Description Framework (or alternative data models), with optimized query execution. An embodiment of the invention provides a data storage and analysis system to support scalable monitoring and analysis of business processes along multiple configurable perspectives and levels of granularity. This embodiment analyses data from processes that have been already executed and from ongoing processes, as a continuous flow of information. This embodiment provides defining and monitoring processes based on no initial domain knowledge about the process and such that the process will be built only from the incoming flow of information. Another embodiment of the invention provides a grid infrastructure that allows storage of data across many grid nodes and distribution of the workload, avoiding the bottleneck represented by constantly querying a database.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of processing a query on a graph data set stored on a plurality of nodes, the method including the steps of: dividing the query into one or more atoms which define individual queries on the graph data set; calculating the execution cost of each atom in the query; determining one or more query paths which set out an order in which one or more atoms are to be executed using said calculated execution costs and interdependence between said atoms; determining a query execution plan which is a set of said query paths which can be executed in parallel; executing said atoms on each of said nodes in accordance with said query execution plan; and combining the results of each query path to produce a result set that is the answer to said query; wherein the calculation of th execution cost is carried out by determining for each atom, a weight which for an atom i(j,k,l)is W(j j ,k i ,l i )=min{wgt(j s ),wgt(k p ),wgt(l o )} where wgt(x pos ) is the number of matching triples with value x in the subject, predicate or object position pos if x is a constant, or the total size of the graph data set if x is a variable, and s, p and o are the subject, predicate and object positions respectively. 2. A method according to claim 1 , further including the step of storing data into the graph data set. 3. A method according to claim 2 , wherein the step of storing data includes the step of converting the data in each triple from character strings to numeric representations. 4. A method according to claim 3 wherein the numeric representations are sequence numbers. 5. A method according to claim 4 wherein the step of storing data includes the sub-steps of: parsing data to be stored to determine a triple containing character strings; comparing each of said character strings against a plurality of stored character strings and their associated sequence numbers; if a character string matches a stored character string, using the sequence number associated with said stored character string to replace the character string in the triple; if a character string does not match any stored character string, obtaining a new sequence number, replacing the character string in the triple with said new sequence number and storing said character string associated with said new sequence number; and storing said triple. 6. A method according to claim 1 wherein, where a query path contains a plurality of atoms, when said step of executing is performed for an atom which is not a first atom in a query path, said atom is only executed on the results of previously executed atoms in the query path. 7. A method according to claim 6 wherein, where the size of results from an atom exceeds a predetermined threshold, a subset of the results from said atom are passed to the subsequent atom or atoms in the query path before the execution of said atom is completed. 8. A method according to claim 1 wherein the calculation of the execution cost for each atom is carried out by assessing the number of instances in said graph data set of one or more constants contained within said atom. 9. A method according to claim 1 wherein a query path may contain atoms which can be executed in parallel to each other as well as atoms which can be executed sequentially. 10. A method according to claim 1 wherein the step of executing includes executing the first atom in a query path on each node only in respect of data stored on said node. 11. A method according to claim 1 wherein the query includes a filter which is a limit on the value of one or more variables in the graph data set, and the step of dividing further includes assigning said filter to each of the atoms which relate to said variables. 12. A method according to claim 11 wherein if said filter is disjunctive, the method includes the further steps of: determining whether it is possible to assign said filter as a whole to an atom which features all of the variables in the filter and if so, assigning it to all such atoms; if it is not possible to so assign said filter, dividing said filter into a plurality of conjunctive filters and assigning a query path to each of said conjunctive filters. 13. A method according to claim 12 further including the step of, when said filter cannot be assigned to any atom, applying said filter to the results of said step of combining. 14. A method of processing a query on a graph data set stored on a plurality of nodes, the method including the steps of: dividing the query into one or more atoms which define individual queries on the graph data set; calculating the execution cost of each atom in the query; determining one or more query paths which set out an order in which one or more atoms are to be executed using said calculated execution costs and interdependence between said atoms; determining a query execution plan which is a set of said query paths which can be executed in parallel; executing said atoms on each of said nodes in accordance with said query execution plan; and combining the results of each query path t . duce a result set that is the answer to said query; and wherein the graph data set stores the data as triples comprising a subject, a predicate and an object, wherein the subject and object are variables and the predicate describes the relationship between the subject and the object; and wherein the calculation of the execution cost is carried out by determining, for each atom, a weight, which for an atom i(j,k,l) is W(j j ,k i ,l i )=min{wgt(j s ),wgt(k p ),wgt(l o )} where wgt(x pos ) is the number of matching triples with value x in the subject, predicate or object position pos if x is a constant, or the total size of the data set if x is a variable, and s, p and o are the subject, predicate and object positions respectively. 15. A method according to claim 1 wherein a query path is determined by: setting the atom with the lowest execution cost as the first atom in the path; setting as a subsequent atom in the path an atom which is dependent on said firs in that it shares one or more variables with said first atom; and repeating until all atoms which are dependent on the first atom or any subsequent atoms have been set in the path. 16. A method according to claim 15 wherein, where there are a plurality of atoms which are dependent, the subsequent atoms are set in order of execution cost of those atoms. 17. A system for storing business process data, the system having a plurality of nodes which are connected to each other via a network, each node having a data storage and a data processing device, wherein: the system stores business process data as a graph data set in a distributed fashion on the data storages of the plurality of nodes; and the data processing device of each node is arranged to processing a query on the graph data set by: dividing the query into one or more atoms which define individual queries on the graph data set; calculating the execution cost of each atom in the query; determining one or more query paths which set out an order in which one or more atoms are to be executed using said calculated execution costs and interdependence between said atoms; determining a query execution plan which is a set of said query paths which can be executed in parallel; executing said atoms on each of said nodes in accordance with said query execution plan; combining the results of each query path to produce a result set that is the answer to said query; wherein the calculation of the execution cost is carried out by determini

Assignees

Inventors

Classifications

G06F16/24534
Query rewriting; Transformation · CPC title
G06F16/2452Primary
Query translation · CPC title
G06F17/30427Primary
Physics · mapped topic
G06F17/30448
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 47471856

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9639575B2 cover?: The invention relates to a method and system that provide a high performance and extremely scalable triple store within the Resource Description Framework (or alternative data models), with optimized query execution. An embodiment of the invention provides a data storage and analysis system to support scalable monitoring and analysis of business processes along multiple configurable perspective…
Who is the assignee on this patent?: Leida Marcello, Afzal Ali, Taylor Paul, and 4 more
What technology area does this patent fall under?: Primary CPC classification G06F16/2452. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 02 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).