Database search

US10762087B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10762087-B2
Application numberUS-201414211574-A
CountryUS
Kind codeB2
Filing dateMar 14, 2014
Priority dateMar 15, 2013
Publication dateSep 1, 2020
Grant dateSep 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method and apparatus for database search, by grouping data entries based on join conditions between the data entries as set in search conditions; and executing the search based on the grouping of the data entries, and can efficiently and effectively resolve the issues that are common to the existing MapReduce query processing systems, thereby being particularly suitable for big dataset analytics in a large cluster system.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for database search, comprising: grouping data entries based on join conditions between the data entries as set in search conditions; implementing the search conditions using an extended-SQL query, wherein the extended SQL query is enabled to include nested queries; creating a query graph based on the grouping of data entries and the extended-SQL query, wherein a node in the query graph represents a query variable in the extended-SQL query and an edge between nodes in the query graph represents a joint condition between query variables; creating, based on the query graph, a query header derived from the extended-SQL query; analyzing the data entries to determine whether multiple MapReduces are enabled to be used to search; executing the search based on the grouping of the data entries and the query graph; and transforming the search conditions by removing patterns. 2. The method according to claim 1 , further comprises transforming the search conditions by at least one of: rewriting rules and eliminating nests. 3. The method according to claim 1 , wherein the step of grouping is implemented further based on dependences between at least one of values of the data entries as set in the search conditions or types of the data entries as set in the search conditions. 4. The method according to claim 3 , wherein the search conditions comprise at least an SQL query statement. 5. The method according to claim 3 , wherein the at least one of the values and types of the data entries in a database comprise at least one of: record type <A1:t1, . . . , An:tn>, sequence type [t], set type {t}, and map type {(t1, t2)}; where n is a natural number, t, t1, . . . , tn are recursively defined types, A1, . . . , An are component names, and map type {(t1, t2)} binds keys of type t1 to values of type t2. 6. The method according to claim 4 , wherein the SQL query statement comprises the following forms: SELECT [DISTINCT] e s FROM p f1 in e f1 , ..., p fn in e fn [WHERE e w ] [GROUP BY p g : e g [HAVING e h ]] [ORDER BY e o ] where statements in [ ] are optional statements; and where e f1 , . . . , e fn , e s , e w , e g and e 0 are arbitrary expressions; and in the FROM clause each of p f1 . . . p fn is a pattern, each of e f1 . . . e fn is an expression that returns a sequence, a set or a map of key-value pair; the form “p in e” indicating that the pattern p matches every element in e with its pattern variables being bound to respective values in the elements; the form p g :e g in a GROUP BY clause indicating that query results are partitioned into groups so that members of each group have the same value e g , and the pattern p g is bound to the GROUP BY value that is common across each group. 7. The method according to claim 6 , wherein executing the search based on the grouping of the data entries further comprises at least one of: setting a new query variable for associating to each SELECT expression in the query statement; removing all the join conditions and filter predicates from the SQL query statement; and replacing the FROM part of each SELECT expression with a single binding that uses a tuple pattern to bind the query variables of directly nested SELECT expressions and variables of FROM part. 8. The method according to claim 1 , wherein the search is implemented in parallel based on MapReduce. 9. The method according to claim 8 , wherein the cost of every MapReduce is evaluated and a MapReduce with the minimum cost is selected for performing the search. 10. The method according to claim 8 , wherein the MapReduce comprises at least one of: reducing I/O costs of reading input data and writing output data by MapReduce jobs; combining a plurality of MapReduce jobs that are independent of each other into a single MapReduce job; and combining the pairs of product-consumer MapReduce jobs if at least one job is Map-only or both jobs share a same Map key. 11. The method according to claim 8 , wherein the MapReduce comprises an execution approach in the following form: REPEAT v=e STEP body [limit n] where v is a repetition variable, the type of e is a set {T} for a certain type T, the type of body is {(T, bool)}, and n is a natural number; and where the REPEAT statement indicates that first v is bound to the value of e, then calculate body repeatedly, and a new value is assigned to v from the previous repetition; the execution stops if either the number of repetitions becomes n or when all the Booleans returned by body are false. 12. An apparatus for database search, comprising: a module configured to group data entries based on join conditions between the data entries as set in search conditions, wherein the module implements the search conditions using an extended-SQL query, wherein the extended SQL query is enabled to include nested queries, wherein the module is enabled to analyze the data entries to determine whether multiple MapReduces are enabled to be used to search; wherein the module is enabled to create a query graph based on the grouping of data entries and the extended-SQL query, wherein a node in the query graph represents a query variable in the extended-SQL query and an edge between nodes in the query graph represents a join condition between query variables and create, based on the query graph, a query header derived from the extended-SQL query; execute the search based on the grouping of the data entries and the query graph; and a pattern removing module configured to remove patterns from the search conditions. 13. The apparatus as claimed in claim 12 , wherein the module comprises a grouping module to group data entries and an executing module to execute the search. 14. The apparatus according to claim 12 , wherein at least one of the module and the grouping module further comprises at least one of the followings to transform the search conditions: a query simplifying module configured to rewrite rules; and a nest eliminating module configured to eliminate nests. 15. The apparatus according to claim 12 , wherein the grouping module implements the grouping further based on dependences between at least one value of the data entries as set in the search conditions of types of the data entries as set in the search conditions, wherein types of the data entries in the database comprise one of: record type <A1:t1, . . . , An:tn>, sequence type [t], set type {t}, and map type {(t1, t2)}; where n is a natural number, t, t1, . . . , tn are recursively defined types, A1, . . . , An are component names, and map type {(t1, t2)} binds keys of type t1 to values of type t2. 16. The apparatus according to claim 12 , wherein the search conditions comprise at least an SQL query statement wherein the SQL query statement comprises the following forms:

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10762087B2 cover?
The present disclosure provides a method and apparatus for database search, by grouping data entries based on join conditions between the data entries as set in search conditions; and executing the search based on the grouping of the data entries, and can efficiently and effectively resolve the issues that are common to the existing MapReduce query processing systems, thereby being particularly…
Who is the assignee on this patent?
Emc Corp, Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/2456. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).