Evaluating SQL expressions on dictionary encoded vectors

US11294816B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11294816-B2
Application numberUS-201715397714-A
CountryUS
Kind codeB2
Filing dateJan 3, 2017
Priority dateOct 15, 2015
Publication dateApr 5, 2022
Grant dateApr 5, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described herein for reducing the number of redundant evaluations that occur when an expression is evaluated against an encoded column vector by caching results of expression evaluations. When executing a query that includes an expression that references columns for which dictionary-encoded column vectors exist, the database server performs a cost-based analysis to determine which expressions (or sub-expressions) would benefit from caching the expression's evaluation result. For each such expression, the database server performs the necessary computations and caches the results for each of the possible distinct input values. When evaluating an expression for a row with a particular set of input codes, a look-up is performed based on the input code combination to retrieve the pre-computed results of that evaluation from the cache.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: maintaining a table on persistent storage; wherein the table includes a first column; wherein the table comprises a particular row that includes a first value for the first column; based on a dictionary, creating, in volatile memory, a dictionary-encoded column vector that contains codes that correspond to values from a range of rows of the first column; wherein the dictionary-encoded column vector stores a first code in a particular entry that corresponds to the particular row; receiving a query that includes a first expression that references the first column; creating in volatile memory, for the first expression, a data structure; evaluating the first expression for the first code by: reading the first code from the particular entry; using the dictionary to determine that the first code corresponds to the first value; and computing a first computed result of the first expression by evaluating the first expression based, at least in part, on the first value; using the first code to index into the data structure to locate a first index entry, in the data structure, that is associated with the first code; storing the first computed result of the first expression in the first index entry; during execution of the query, evaluating the first expression for a second row of the table that is associated with a second entry, in the dictionary-encoded column vector, that stores a second instance of the first code by performing the steps of: reading the second instance of the first code from the second entry, of the dictionary-encoded column vector, that corresponds to the second row of the table; using the first code obtained from the second entry of the dictionary-encoded column vector to index into the data structure to locate the first index entry that stores the first computed result; and reading the first computed result from the first index entry; during execution of the query and after evaluating the first expression for the second row of the table, evaluating the first expression for a third row of the table that is associated with a third entry, in the dictionary-encoded column vector, that stores a second code, by performing the steps of: determining that the second code represents an input for which pre-computed results have not yet been generated for the first expression; in response to determining that the second code represents an input for which pre-computed results have not yet been generated for the first expression: evaluating the first expression for the second code by: using the dictionary to determine that the second code corresponds to a second value, and computing a second computed result of the first expression by evaluating the first expression based, at least in part, on the second value; and storing the second computed result of the first expression in a second index entry associated with the second code; after execution of the query, returning results of the query; wherein the method is performed by one or more computing devices. 2. The method of claim 1 wherein: the first expression references N columns; and the data structure is an N-dimensional array. 3. The method of claim 1 further comprising: performing a test on each expression, of a plurality of expressions contained in the query, to determine which expressions of the plurality of expressions qualify to be cached-result expressions; and creating the data structure in response to determining that the first expression qualifies to be a cached-result expression. 4. The method of claim 3 wherein, for the first expression, the test is based, at least in part, on cardinality of the first column for the range of rows. 5. The method of claim 3 , wherein, for the first expression, the test is based, at least in part, on whether the first expression is a sub-expression that occurs more than once in the query. 6. The method of claim 3 wherein: the first column is one of a plurality of columns referenced in the first expression; and each column, of the plurality of columns, has a corresponding dictionary with which values of said each column have been encoded; and for the first expression, the test is based, at least in part, on a number of all possible combinations of codes from the dictionaries that correspond to the plurality of columns. 7. The method of claim 3 wherein: the first column is one of a plurality of columns referenced in the first expression; and for the first expression, the test is based, at least in part, on a number of distinct input combinations, of the plurality of columns referenced in the first expression, actually present in the table for the range of rows. 8. The method of claim 3 wherein: the first column is one of a plurality of columns referenced in the first expression; the query has a WHERE condition; and for the first expression, the test is based, at least in part, on a number of distinct input combinations, for the first expression, present in the actual rows, within the range of rows, that satisfy the WHERE condition. 9. The method of claim 1 wherein the first expression references a second column of the table that has a corresponding column vector that is not dictionary encoded. 10. The method of claim 1 wherein the first expression references a second column of the table that is not mirrored in volatile memory. 11. A method comprising: maintaining a table on persistent storage; wherein the table includes a first column; wherein the table comprises a particular row that includes a first value for the first column; based on a dictionary, creating, in volatile memory, a dictionary-encoded column vector that contains codes that correspond to values from a range of rows of the first column; wherein the dictionary-encoded column vector stores a first code in a particular entry that corresponds to the particular row; receiving a query that includes a first expression that references the first column; creating in volatile memory, for the first expression, a first structure; evaluating the first expression for the first code by: using the dictionary to determine that the first code in the particular entry corresponds to the first value; and computing a first computed result of the first expression by evaluating the first expression based, at least in part, on the first value; using the first code to index into the first structure to locate a first index entry, in the first structure, that is associated with the first code; storing the first computed result of the first expression in the first index entry; during execution of the query, evaluating the first expression for a second row of the table that is associated with a second entry, in the dictionary-encoded column vector, that stores a second instance of the first code by performing the steps of: reading the second instance of the first code from the second entry, of the dictionary-encoded column vector, that corresponds to the second row of the table, using the first code obtained from the second entry of the dictionary-encoded column vector to index into the first structure to locate the first index entry that stores the first computed result, and reading the first computed result from the first index entry; wherein the first expression is a subexpression of a second expression of the query; creating in volatile memory, for the second expression, a second structure; evaluating the second expression for the first code and one or more input values by: using the first code to index into the first structure to locate the first index entry; and reading the first computed result from the first index entry in

Assignees

Inventors

Classifications

  • Query translation · CPC title

  • Details of cache memory · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers {sorting methods in general}(G06F7/36 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11294816B2 cover?
Techniques are described herein for reducing the number of redundant evaluations that occur when an expression is evaluated against an encoded column vector by caching results of expression evaluations. When executing a query that includes an expression that references columns for which dictionary-encoded column vectors exist, the database server performs a cost-based analysis to determine whic…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/2452. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 05 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).