System and method for optimizing large database management systems using bloom filter
US-2018349364-A1 · Dec 6, 2018 · US
US11977545B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11977545-B2 |
| Application number | US-201916267608-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 5, 2019 |
| Priority date | Oct 15, 2018 |
| Publication date | May 7, 2024 |
| Grant date | May 7, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes receiving, by a first computing entity of a database system, a query request that is formatted in accordance with a generic query format. The method further includes generating, by the first computing entity, an initial query plan based on the query request and a query instruction set. The method further includes determining, by the first computing entity, storage parameters. The method further includes determining, by the first computing entity, processing resources for processing the query request based on the storage parameters. The method further includes generating, by the first computing entity, an optimized query plan from the initial query plan based on the storage parameters, the processing resources, and optimization tools. The method further includes sending, by the first computing entity, the optimized query plan to a second computing entity for distribution and execution of the optimized query plan.
Opening claim text (preview).
What is claimed is: 1. A method comprises: receiving, by a first computing entity of a database system, a query request that is formatted in accordance with a generic query format, wherein the query request identifies at least one database table that includes a data set; generating, by the first computing entity, an initial query plan based on the query request, wherein the initial query plan is generated to include a subset of a plurality of operations based on selecting the subset of a plurality of operations indicated in a query instruction set of the database system; determining, by the first computing entity, storage parameters regarding how the at least one database table is stored within the database system; expanding a level of a computation of the initial query plan from a single level to at least three levels to produce a multiple level query plan for performance via multiple levels of parallelization; determining, by the first computing entity, a plurality of processing resources of the database system for processing the query request based on the storage parameters indicating the plurality of processing resources are associated with storing the data set; generating, by the first computing entity, an optimized query plan from the multiple level query plan based on the storage parameters, the processing resources, and optimization tools, wherein the optimized query plan indicates: a first set of operations be performed via one level of the at least three levels via one corresponding set of parallelized resources in one level of parallelism of the multiple levels of parallelism; and a second set of operations be performed via another level of the at least three levels via another corresponding set of parallelized resources in another level of parallelism of the multiple levels of parallelism; generating, by the first computing entity, a distribution plan to distribute portions of the optimized query plan among the plurality of processing resources; and sending, by the first computing entity, the optimized query plan to a second computing entity of the database system that includes at least some of the plurality of processing resources for distribution and execution of the optimized query plan in the multiple levels of parallelism by the plurality of processing resources in accordance with the at least three levels based on the distribution plan, wherein the second computing entity includes a plurality of computing devices of a plurality of storage clusters, wherein each storage cluster of the plurality of storage clusters includes a corresponding set of multiple computing devices of the plurality of computing devices; wherein sending the optimized query plan to the second computing entity includes selecting, by the first computing entity, a selected plurality of computing devices for parallelized execution of the first set of operations via the one level of the at least three levels based on selecting, for each storage cluster of the plurality of storage clusters, one computing device of corresponding set of multiple computing devices for inclusion in the selected plurality of computing devices, wherein the selected plurality of computing devices is a proper subset of the plurality of computing devices; wherein distribution and execution of the optimized query plan includes communicating, by each computing device of the selected plurality of computing devices to other computing devices of the corresponding set of multiple computing devices of a corresponding storage cluster that includes the each computing device, the second set of operations of the another level of the at least three levels for parallelized execution of the second set of operations via the other computing devices of the corresponding set of multiple computing devices. 2. The method of claim 1 , wherein the generating the initial query plan further comprises: converting, by the first computing entity, the query request into a syntax tree that represents a syntactic structure of instructions of the query instruction set of the database system; validating, by the first computing entity, the syntax tree by one or more of: verifying statements of the query request are valid statements of the generic query format; verifying that the data set is a valid data set; and verifying no hang conditions occurs; when the syntax tree is validated, annotating, by the first computing entity, the syntax tree with particular information of the data set to produce an annotated syntax tree; and generating, by the first computing entity, the initial query plan based on the annotated syntax tree. 3. The method of claim 2 further comprises: when the syntax tree is not validated, sending, by the first computing entity, a query error message to a requesting device associated with the query request. 4. The method of claim 2 further comprises: when the syntax tree is not validated, identifying, by the first computing entity, a portion of the query request causing the syntax tree to not be valid; changing, by the first computing entity, coding of the portion of the query request while substantially preserving meaning of the portion of the query request to produce a changed query request; and repeating, by the first computing entity, the converting and validated steps for the changed query request. 5. The method of claim 1 , wherein the determining the storage parameters comprises one or more of: retrieving, by the first computing entity, the storage parameters from a lookup table based on identity of the data set; sending, by the first computing entity, a storage parameter request to the second computing entity regarding the data set, wherein the second computing entity is within a parallelized data store, retrieve, and/or process sub-system of the database system; and sending, by the first computing entity, a storage parameter request to a third computing entity regarding the data set, wherein the third computing entity is within a parallelized data input sub-system of the database system. 6. The method of claim 1 , wherein the storage parameters comprise two or more of: number of rows per segment of the data set; number of columns of the data set; number of partitions the data set was divided into; a number of segments each partition was divided into; a data redundancy encoding scheme; a number of storage clusters storing the data set; a number of computing devices within a storage cluster; a number of nodes within a computing device; and a number of processing core resources within a node. 7. The method of claim 1 , wherein the determining processing resources comprises one or more of: determining, as the processing resources, a number of processing core resources associated with storing the data set; determining, as the processing resources, a number of nodes associated with storing the data set; and determining, as the processing resources, a number of computing devices associated with storing the data set. 8. The method of claim 1 , wherein the generating the optimized query plan comprises: determining an initial cost value for the multiple level query plan; comparing the initial cost value with a cost threshold; when the initial cost value compares unfavorably to the cost threshold, changing the multiple level query plan in accordance with one or more of the optimization tools to produce an updated multiple level query plan as the multiple level query plan; and when a cost value of the multiple level query plan compares favorably to the cost threshold, outputting the multiple level query plan as the optimized query plan. 9. The method of claim 1 , wherein the optimization tools comprise two or more of: one or more pre-opt
Plan optimisation · CPC title
Improving or facilitating administration, e.g. storage management · CPC title
Migration mechanisms · CPC title
Hybrid storage device · CPC title
Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers {sorting methods in general}(G06F7/36 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.