In-memory column-level multi-versioned global dictionary for in-memory databases

US10726016B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10726016-B2
Application numberUS-201615294460-A
CountryUS
Kind codeB2
Filing dateOct 14, 2016
Priority dateOct 15, 2015
Publication dateJul 28, 2020
Grant dateJul 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described herein for sharing a dictionary across multiple in-memory compression units (IMCUs). After a dictionary is used to encode a first column vector in a first IMCU, the same dictionary is used to encode a second column vector in a second IMCU. The entries in the dictionary are in sort order to facilitate binary searching when performing value-to-code look-ups. If, during the encoding of the second column vector, values are encountered for which the dictionary does not already have codes, then a “sort-order-boundary” is established after the last entry in the dictionary, and entries for the newly encountered values are added to the dictionary, after the sort-order-boundary. To facilitate value-to-code look-ups, the new entries are also sorted relative to each other, creating a second “sort order set”. A new version of the dictionary may be created when the number of sort order sets in the first version of the dictionary reaches a configurable threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: maintaining a table on persistent storage; wherein the table includes a particular column; loading values from a first portion of the particular column into a first column vector in a first volatile memory of a first node; wherein the first node is a first computing device that comprises the first volatile memory; building, in the first volatile memory of the first node, a first copy of a particular dictionary; wherein building the first copy of the particular dictionary includes: creating a first sorted list of values that includes: values from the first portion of the particular column; and values, received from a second node via inter-node communication, obtained by the second node from a second portion of the particular column; storing entries in the first copy of the particular dictionary, for values in the first sorted list of values, in sort order of the first sorted list of values; encoding, by the first node, the first column vector, using the first copy of the particular dictionary, to produce a first encoded column vector; loading values from the second portion of the particular column into a second column vector in a second volatile memory of the second node; wherein the second node is a second computing device that comprises the second volatile memory; wherein the first node is a different computing device than the second node; wherein both the first node and the second node have access to the table on persistent storage; building, in the second volatile memory of the second node, a second copy of the particular dictionary; wherein building the second copy of the particular dictionary includes: creating a second sorted list of values that includes: values from the second portion of the particular column; and values, received from the first node via inter-node communication, obtained by the first node from the first portion of the particular column; storing entries in the second copy of the particular dictionary, for values in the second sorted list of values, in sort order of the second sorted list of values; wherein the first sorted list of values is identical to the second sorted list of values; encoding, by the second node, the second column vector, using the second copy of the particular dictionary, to produce a second encoded column vector. 2. The method of claim 1 further comprising: analyzing the particular column to determine whether a global dictionary should be used to encode column vectors for the particular column; and wherein the step of encoding the second column vector using the particular dictionary is performed in response to determining that a global dictionary should be used to encode column vectors for the particular column. 3. A method comprising: maintaining a table on persistent storage; wherein the table includes a particular column; loading values from a first portion of the particular column into a first column vector in a first volatile memory of a first computing device; building a particular dictionary based on values in the first column vector; wherein building the particular dictionary includes: obtaining, from the first column vector, a first set of values that are not already in the particular dictionary; creating a second set of values by sorting the first set of values based on a particular sorting criterion and removing duplicates from the first set of values; storing a first ordered set of entries in the particular dictionary; wherein the first ordered set of entries includes a single entry for each distinct value in the second set of values; wherein, within the particular dictionary, order of the first ordered set of entries reflects order of the second set of values; encoding the first column vector, using the particular dictionary, to produce a first encoded column vector; and loading values from a second portion of the particular column into a second column vector, storing the second column vector in one of: the first volatile memory of the first computing device; or a second volatile memory of a second computing device; adding entries to the particular dictionary based values in the second column vector; wherein adding entries to the particular dictionary includes: obtaining, from the second column vector, a third set of values that are not already in the particular dictionary; creating a fourth set of values by sorting the third set of values based on the particular sorting criterion and removing duplicates from the third set of values; storing a second ordered set of entries in the particular dictionary; wherein the second ordered set of entries includes a single entry for each distinct value in the fourth set of values; wherein, within the particular dictionary, order of the second ordered set of entries reflects order of the fourth set of values; encoding the second column vector, using the particular dictionary, to produce a second encoded column vector; wherein: within the table, the first portion of the particular column and the second portion of the particular column have a particular value in common; the particular dictionary maps the particular value to a single unique code; the first ordered set of entries constitute a first sort order set, wherein the first order set includes a single entry for each distinct value from the first portion of the particular column; after the first sort order set and before a second order set, the particular dictionary includes a sort order boundary; wherein the second ordered set of entries constitute the second order set, which includes a single entry for each distinct value that: is in the second portion of the particular column; and is not in the first portion of the particular column; wherein the only entry, within the particular dictionary, for the particular value is within the first order set. 4. The method of claim 3 further comprising performing a value-to-code look-up by performing a first binary search on the first sort order set and a second binary search on the second sort order set. 5. The method of claim 3 wherein: a first version of the particular dictionary is used to encode the first and second encoded column vectors; and the method further comprises: in response to detecting that the first version of the particular dictionary includes a maximum number of sort order sets, creating a second version of the particular dictionary that includes entries for all values from the first version of the particular dictionary; loading values from a third portion of the particular column into a third column vector in volatile memory; and encoding the third column vector, using the second version of the particular dictionary, to produce a third encoded column vector. 6. The method of claim 5 further comprising, after producing the first encoded column vector, rebuilding the first encoded column vector by: loading values from the first portion of the particular column into a new first column vector in volatile memory; and encoding the new first column vector, using the second version of the particular dictionary, to produce a new first encoded column vector. 7. The method of claim 5 further comprising: maintaining a first usage-count for the first version of the particular dictionary; incrementing the first usage-count in response to encoding the first column vector using the first version of the particular dictionary; incrementing the first usage-count in response to encoding the second column vector using the first version of the particular dictionary; and decrementing the first usage-count when any encoded column vector that was encoded using the first version of the particular dictionary is discarded or rebuilt.

Assignees

Inventors

Classifications

  • G06F16/221Primary

    Column-oriented storage; Management thereof · CPC title

  • using vector based model · CPC title

  • Intermediate data storage techniques for performance improvement · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10726016B2 cover?
Techniques are described herein for sharing a dictionary across multiple in-memory compression units (IMCUs). After a dictionary is used to encode a first column vector in a first IMCU, the same dictionary is used to encode a second column vector in a second IMCU. The entries in the dictionary are in sort order to facilitate binary searching when performing value-to-code look-ups. If, during th…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/221. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).