Federation optimization using ordered queues

US2016147888A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016147888-A1
Application numberUS-201414550084-A
CountryUS
Kind codeA1
Filing dateNov 21, 2014
Priority dateNov 21, 2014
Publication dateMay 26, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer program products for optimization of query processing in a data federation system using priority queuing techniques are provided. Priority queuing techniques may include generating a query vector corresponding to a query, comparing the query vector to historical query vectors to determine similarity, determining an expected processing time for the query based on the determined similarity, and inserting the query into a priority ordered queue at a particular position based on the expected processing time.

First claim

Opening claim text (preview).

What is claimed is: 1 . A federation engine server comprising: one or more processors; a federated query queue comprising one or more federated queries; a first source query queue corresponding to the first data source, the first source query queue comprising one or more source queries; a data store comprising one or more historical query vectors; a data quality coordinator executable by the one or more processors to: generate a federated query vector based on a federated query received from a first client of one or more clients; perform a first similarity measure between the federated query vector and at least one of the one or more historical query vectors to determine an estimated processing time of the federated query; set a priority of the federated query based on the estimated processing time of the federated query; based on the priority of the federated query, determine a position of the federated query in the federated query queue relative to at least one of the one or more federated queries; insert the federated query into the federated query queue at the position; generate a plurality of source queries corresponding to the federated query; generate a first source query vector based on a first source query of the plurality of source queries; perform a second similarity measure between the first source query vector and at least one of the one or more historical query vectors to determine an estimated processing time of the first source query; set a priority of the first source query based on the estimated processing time of the first source query; based on the priority of the first source query, determine a position of the first source query in the first source query queue relative to at least one of the one or more source queries; insert the first source query into the first source query queue at the position; and retrieve a data result responsive to the first source query from a first data source in an order that is based upon the position of the first source query in the first source query queue. 2 . The federated system of claim 1 further comprising: a second source query queue corresponding to a second data source; the data quality coordinator further to: generate a second source query vector based on a second source query of the plurality of source queries; perform a third similarity measure between the second source query vector and at least one of the one or more historical source query vectors to determine an estimated processing time of the second source query; set a priority of the second source query based on the estimated processing time of the second source query; based on the priority of the second source query, determine a position of the second source query in the second source query queue relative to a position of at least one of the one or more source queries; and insert the second source query into the second source query queue at the position; 3 . The federated system of claim 2 , the data quality coordinator further to: determine that the estimated processing time of the first source query is greater than the estimated processing time of the second source query; and set the priority of the second source query to the estimated processing time of the first source query minus the estimated processing time of the second source query. 4 . The federated system of claim 1 , wherein at least one of (i) the first similarity measure and (ii) the second similarity measure is a cosine similarity measure. 5 . The federated system of claim 1 , wherein at least one of the federated query and the first source query is an SQL query. 6 . The federated system of claim 1 , wherein (i) the federated query vector comprises components representing tables and columns identified in the federated query and (ii) the first source query vector comprises components representing tables and columns identified the first source query. 7 . The federated system of claim 1 , the data quality coordinator further to: adjust the priority of the federated query in the federated query queue by: subtracting a pre-configured number from the priority of the federated query based on a pre-configured time duration elapsing. 8 . A computer-implemented method for query processing, comprising: receiving a federated query from a client; generating a first source query and a second source query from the federated query; generating a first source query vector corresponding to the first source query, wherein the first source query vector comprises components representing identified tables and columns in the first source query; generating a second source query vector corresponding to the second source query, wherein the second source query vector comprises components representing identified tables and columns in the second source query; measuring similarity between the first source query vector and at least one of one or more historical query vectors to determine an estimated processing time of the first source query; measuring similarity between the second source query vector and at least one of the one or more historical query vectors to determine an estimated processing time of the second source query; setting a priority of the first source query based on the estimated processing time of the first source query; setting a priority of the second source query based on the estimated processing time of the second source query; inserting the first source query at a position in a first source query queue based on the priority of the first source query; inserting the second source query at a position in a second source query queue based on the priority of the second source query; removing the first source query from the first source query queue based on the position of the first source query in the first source query queue; and retrieving data results corresponding to the first source query. 9 . The method of claim 8 , wherein setting the priority of the first source query and the second source query comprises: determining that the estimated processing time of the first source query is greater than the estimated processing time of the second source query; and setting the priority of the second source query to the estimated processing time of the first source query minus the estimated processing time of the second source query. 10 . The method of claim 8 , further comprising: determining a priority adjustment, the priority adjustment equal to an estimated processing time of the federated query minus the estimated processing time of the first source query; adjusting the priority of the first source query by subtracting the priority adjustment from the priority of the first source query; and adjusting the priority of the second source query by subtracting the priority adjustment from the priority of the second source query. 11 . The method of claim 8 , wherein measuring similarity comprises measuring cosine similarity. 12 . The method of claim 8 further comprising, if a plurality of the one or more historical query vectors are determined to be similar to the first source query vector, averaging estimated processing times corresponding to the plurality of the one or more historical queries to determine the estimated processing time of the first source query. 13 . The method of claim 8 , wherein a first component of the first source query vector is a count of the number of columns identified in the first source query. 14 . The method of claim 8 , further comprising: determining an actual processing time for at least one of: the federated query; the first source query; and the

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016147888A1 cover?
Methods, systems, and computer program products for optimization of query processing in a data federation system using priority queuing techniques are provided. Priority queuing techniques may include generating a query vector corresponding to a query, comparing the query vector to historical query vectors to determine similarity, determining an expected processing time for the query based on t…
Who is the assignee on this patent?
Red Hat Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30867. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).