Optimizing limit queries over analytical functions

US2021382920A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021382920-A1
Application numberUS-202017034348-A
CountryUS
Kind codeA1
Filing dateSep 28, 2020
Priority dateJun 5, 2020
Publication dateDec 9, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A relational database management system (RDBMS) optimizes limit queries over analytical functions, wherein the limit queries include an output clause comprising a LIMIT, TOP and SAMPLE clause with an expression specifying a limit that is a number K or a percentage α %. The optimizations of the limit queries include: (1) static compile-time optimizations, and (2) dynamic run-time optimizations, based on semantic properties of “granularity” and “input-to-output cardinality” for the analytical functions.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method, comprising: executing a relational database management system (RDBMS) in a computer system, wherein the RDBMS manages a relational database comprised of one or more tables stored on one or more data storage devices connected to the computer system; and optimizing a limit query over an analytical function in the RDBMS, wherein the analytical function is provided one or more input records stored in one or more of the tables of the relational database, the limit query specifies how many output records from the analytical function to return as a result set, and the optimizing determines how many of the input records are provided to the analytical function to obtain the output records specified by the limit query. 2 . The method of claim 1 , wherein the limit query includes an output clause comprising a LIMIT, TOP or SAMPLE clause with an expression specifying a limit that is a number K or a percentage α %. 3 . The method of claim 2 , wherein the optimizing comprises minimizing the input records provided to the analytical function to obtain the output records specified by the limit query. 4 . The method of claim 2 , wherein the optimizing comprises maintaining a semantic property of granularity and a semantic property of input-to-output cardinality for the analytical function, wherein: the semantic property of granularity defines an inter-dependency among the input records when processed by the analytical function, such that: when the granularity is row, that indicates that the analytical function operates on each of the input records independent of others of the input records, such that, for a given one of the input records, adding or removing the others of the input records does not affect the output records obtained from the analytical function for the given one of the input records; when the granularity is partition, that indicates that the input records are partitioned based on a user-defined criteria and that the analytical function operates on each partition independent of other partitions, such that, for a given partition, adding or removing the other partitions does not affect the output records obtained from the analytical function for the given partition; and when the granularity is other, that indicates that the analytical function does not operate at the granularity of row or partition, such that the limit query cannot be optimized; and the semantic property of input-to-output cardinality is a range of [x,y], where x is a minimum value and y is a maximum value, for how many output records are obtained per input record when the granularity is row, and how many output records are obtained per partition when the granularity is partition. 5 . The method of claim 4 , wherein the optimizing comprises a static compile-time optimization of the limit query. 6 . The method of claim 5 , wherein the static compile-time optimization comprises optimizing the LIMIT clause with the expression K, when the granularity is row and the input-to-output cardinality is the range of [x,y] where x≥1, and a LIMIT CEILING operator of K/x is applied to the input records, only K/x of the input records are provided to the analytical function, and a LIMIT K operator is applied to the output records from the analytical function to obtain K of the output records as the result set. 7 . The method of claim 5 , wherein the static compile-time optimization comprises optimizing the LIMIT clause with the expression K, when the granularity is partition and the input-to-output cardinality is the range of [x,y] where x≥1, and a grouping operator groups the input records into one or more partitions by a column expression, a LIMIT CEILING operator of K/x is applied to the partitions, only K/x of the partitions are provided to the analytical function, and a LIMIT K operator is applied to the output records from the analytical function to obtain K of the output records as the result set, or a GROUP-LIMIT operator groups the input records into one or more partitions by a column expression, only K/x of the partitions are provided to the analytical function, and a LIMIT operator is applied to the output records from the analytical function to obtain K of the output records as the result set. 8 . The method of claim 5 , the static compile-time optimization comprises optimizing the TOP clause with the expression K, when the TOP clause is paired with a preceding ORDER BY clause for selecting a top K of the input records with respect to an order expression, when the granularity is row and the input-to-output cardinality is the range of [x,y] where x≥1, and an ORDER BY operator orders the input records by the order expression, a LIMIT CEILING operator provides only K/x of the input records to the analytical function, an ORDER BY operator orders the output records from the analytical function by the order expression, and a LIMIT operator is applied to obtain only K of the output records as the result set, or a LIMIT CEILING operator provides K/x of the input records to the analytical function, and a TOP K operator is applied to the output records from the analytical function to obtain only TOP K of the output records as the result set. 9 . The method of claim 5 , the static compile-time optimization comprises optimizing the TOP clause with an expression K, when the TOP clause is paired with a preceding ORDER BY clause with an order expression for selecting a TOP K of the input records with respect to the order expression, when the granularity is partition and the input-to-output cardinality is the range of [x,y] where x≥1, and a grouping operator groups the input records into one or more partitions by a column expression, a TOP CEILING operator provides only K/x of the partitions to the analytical function, and a TOP K operator is applied to the output records from the analytical function to obtain only a TOP K of the output records as the result set, or a TOP-GROUPS operator groups the input records into one or more partitions according to a column expression, and only K/x of the partitions are provided to the analytical function, and a TOP K operator is applied to the output records from the analytical function to obtain only K of the output records as the result set. 10 . The method of claim 5 , the static compile-time optimization comprises optimizing the SAMPLE clause with an expression K, when the granularity is row and the input-to-output cardinality is the range of [x,y] where x=y=1, and a SAMPLE operator provides only K of the input records to the analytical function, and only K of the output records from the analytical function are obtained as the result set. 11 . The method of claim 5 , the static compile-time optimization comprises optimizing the SAMPLE clause with an expression K, when the granularity is partition and the input-to-output cardinality is the range of [x,y] where x=y=1, and a grouping operator groups the input records into one or more partitions by a column expression, a SAMPLE operator groups K of the partitions, the K of the partitions are provided to the analytical function, and the analytical function generates the output records as the result set, or a GROUP-SAMPLE operator groups the input records into K partitions by a column expression, the K partitions are provided to the analytical function, and the analytical function generates the output records as the result set. 12 . The method of claim 4 , the optimizing comprises a dynamic run-time optimization of the limit query, wherein a loop or iterator is used with the analytical function to control the input records provided to the analyt

Assignees

Inventors

Classifications

  • Selectivity estimation or determination · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • Unary operations; Data partitioning operations · CPC title

  • G06F16/288Primary

    Entity relationship models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021382920A1 cover?
A relational database management system (RDBMS) optimizes limit queries over analytical functions, wherein the limit queries include an output clause comprising a LIMIT, TOP and SAMPLE clause with an expression specifying a limit that is a number K or a percentage α %. The optimizations of the limit queries include: (1) static compile-time optimizations, and (2) dynamic run-time optimizations, …
Who is the assignee on this patent?
Teradata Us Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).