Integrating kafka data-in-motion with data-at-rest tables
US-2020125572-A1 · Apr 23, 2020 · US
US10726016B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10726016-B2 |
| Application number | US-201615294460-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 14, 2016 |
| Priority date | Oct 15, 2015 |
| Publication date | Jul 28, 2020 |
| Grant date | Jul 28, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are described herein for sharing a dictionary across multiple in-memory compression units (IMCUs). After a dictionary is used to encode a first column vector in a first IMCU, the same dictionary is used to encode a second column vector in a second IMCU. The entries in the dictionary are in sort order to facilitate binary searching when performing value-to-code look-ups. If, during the encoding of the second column vector, values are encountered for which the dictionary does not already have codes, then a “sort-order-boundary” is established after the last entry in the dictionary, and entries for the newly encountered values are added to the dictionary, after the sort-order-boundary. To facilitate value-to-code look-ups, the new entries are also sorted relative to each other, creating a second “sort order set”. A new version of the dictionary may be created when the number of sort order sets in the first version of the dictionary reaches a configurable threshold.
Opening claim text (preview).
What is claimed is: 1. A method comprising: maintaining a table on persistent storage; wherein the table includes a particular column; loading values from a first portion of the particular column into a first column vector in a first volatile memory of a first node; wherein the first node is a first computing device that comprises the first volatile memory; building, in the first volatile memory of the first node, a first copy of a particular dictionary; wherein building the first copy of the particular dictionary includes: creating a first sorted list of values that includes: values from the first portion of the particular column; and values, received from a second node via inter-node communication, obtained by the second node from a second portion of the particular column; storing entries in the first copy of the particular dictionary, for values in the first sorted list of values, in sort order of the first sorted list of values; encoding, by the first node, the first column vector, using the first copy of the particular dictionary, to produce a first encoded column vector; loading values from the second portion of the particular column into a second column vector in a second volatile memory of the second node; wherein the second node is a second computing device that comprises the second volatile memory; wherein the first node is a different computing device than the second node; wherein both the first node and the second node have access to the table on persistent storage; building, in the second volatile memory of the second node, a second copy of the particular dictionary; wherein building the second copy of the particular dictionary includes: creating a second sorted list of values that includes: values from the second portion of the particular column; and values, received from the first node via inter-node communication, obtained by the first node from the first portion of the particular column; storing entries in the second copy of the particular dictionary, for values in the second sorted list of values, in sort order of the second sorted list of values; wherein the first sorted list of values is identical to the second sorted list of values; encoding, by the second node, the second column vector, using the second copy of the particular dictionary, to produce a second encoded column vector. 2. The method of claim 1 further comprising: analyzing the particular column to determine whether a global dictionary should be used to encode column vectors for the particular column; and wherein the step of encoding the second column vector using the particular dictionary is performed in response to determining that a global dictionary should be used to encode column vectors for the particular column. 3. A method comprising: maintaining a table on persistent storage; wherein the table includes a particular column; loading values from a first portion of the particular column into a first column vector in a first volatile memory of a first computing device; building a particular dictionary based on values in the first column vector; wherein building the particular dictionary includes: obtaining, from the first column vector, a first set of values that are not already in the particular dictionary; creating a second set of values by sorting the first set of values based on a particular sorting criterion and removing duplicates from the first set of values; storing a first ordered set of entries in the particular dictionary; wherein the first ordered set of entries includes a single entry for each distinct value in the second set of values; wherein, within the particular dictionary, order of the first ordered set of entries reflects order of the second set of values; encoding the first column vector, using the particular dictionary, to produce a first encoded column vector; and loading values from a second portion of the particular column into a second column vector, storing the second column vector in one of: the first volatile memory of the first computing device; or a second volatile memory of a second computing device; adding entries to the particular dictionary based values in the second column vector; wherein adding entries to the particular dictionary includes: obtaining, from the second column vector, a third set of values that are not already in the particular dictionary; creating a fourth set of values by sorting the third set of values based on the particular sorting criterion and removing duplicates from the third set of values; storing a second ordered set of entries in the particular dictionary; wherein the second ordered set of entries includes a single entry for each distinct value in the fourth set of values; wherein, within the particular dictionary, order of the second ordered set of entries reflects order of the fourth set of values; encoding the second column vector, using the particular dictionary, to produce a second encoded column vector; wherein: within the table, the first portion of the particular column and the second portion of the particular column have a particular value in common; the particular dictionary maps the particular value to a single unique code; the first ordered set of entries constitute a first sort order set, wherein the first order set includes a single entry for each distinct value from the first portion of the particular column; after the first sort order set and before a second order set, the particular dictionary includes a sort order boundary; wherein the second ordered set of entries constitute the second order set, which includes a single entry for each distinct value that: is in the second portion of the particular column; and is not in the first portion of the particular column; wherein the only entry, within the particular dictionary, for the particular value is within the first order set. 4. The method of claim 3 further comprising performing a value-to-code look-up by performing a first binary search on the first sort order set and a second binary search on the second sort order set. 5. The method of claim 3 wherein: a first version of the particular dictionary is used to encode the first and second encoded column vectors; and the method further comprises: in response to detecting that the first version of the particular dictionary includes a maximum number of sort order sets, creating a second version of the particular dictionary that includes entries for all values from the first version of the particular dictionary; loading values from a third portion of the particular column into a third column vector in volatile memory; and encoding the third column vector, using the second version of the particular dictionary, to produce a third encoded column vector. 6. The method of claim 5 further comprising, after producing the first encoded column vector, rebuilding the first encoded column vector by: loading values from the first portion of the particular column into a new first column vector in volatile memory; and encoding the new first column vector, using the second version of the particular dictionary, to produce a new first encoded column vector. 7. The method of claim 5 further comprising: maintaining a first usage-count for the first version of the particular dictionary; incrementing the first usage-count in response to encoding the first column vector using the first version of the particular dictionary; incrementing the first usage-count in response to encoding the second column vector using the first version of the particular dictionary; and decrementing the first usage-count when any encoded column vector that was encoded using the first version of the particular dictionary is discarded or rebuilt.
Column-oriented storage; Management thereof · CPC title
using vector based model · CPC title
Intermediate data storage techniques for performance improvement · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.