Adaptive compression optimization for effective pruning

US12204517B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12204517-B2
Application numberUS-202217933903-A
CountryUS
Kind codeB2
Filing dateSep 21, 2022
Priority dateDec 14, 2018
Publication dateJan 21, 2025
Grant dateJan 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A database management system is described that can encode data to generate a plurality of data vectors. The database management system can perform the encoding by using a dictionary. The database management system can adaptively reorder the plurality of data vectors to prepare for compression of the plurality of data vectors. During a forward pass of the adaptive reordering, most frequent values of a data vector of the plurality of data vectors can be moved-up in the data vector. During a backward pass of the adaptive reordering, content within a rest range of a plurality of rest ranges can be rearranged within the plurality of data vectors according to frequencies of the content. The reordering according to frequency can further sort the rest range by value. Related apparatuses, systems, methods, techniques, computer programmable products, computer readable media, and articles are also described.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: reordering, by a database management system, at least a first data vector for a first column of a database table and a second data vector for a second column of the database table using a forward pass followed by a backward pass, wherein the forward pass sorts, based on frequency of occurrence, at least the first data vector and the second data vector, wherein the backward pass prioritizes a selection order of the first data vector or the second data vector based on a number of distinct value identifiers in a rest range of at least one of the first data vector or the second data vector, wherein the backward pass sorts by sorting the rest range in at least the first data vector and the second data vector by at least grouping close value identifiers together, wherein at least one rest range is prevented from being split during the sorting; determining, by the database management system and for at least the first data vector, a prefix part and a non-prefix part; generating, by the database management system, a min-max index for the non-prefix part associated with the first data vector; and storing, for at least the first data vector, a value of the prefix part and a length of the prefix part. 2. The method of claim 1 , wherein the backward pass prioritizes the selection order of the first data vector over the second data vector, in response to the first data vector having a higher quantity of distinct values of value identifiers when compared to the second data vector. 3. The method of claim 1 , wherein the backward pass further comprises grouping close value identifiers together. 4. The method of claim 1 , further comprising: encoding, by the database management system, data of at least the first column and the second column of the database table to generate a plurality of data vectors including the first data vector for the first column and the second data vector for the second column. 5. The method of claim 4 , wherein the encoding comprises dictionary encoding. 6. The method of claim 1 , wherein the forward pass comprises moving-up, by the database management system, a most frequently occurring value in at least one of the first data vector or the second data vector. 7. A system comprising: at least one processor; and at least one memory including code which when executed by the at least one processor causes operations comprising: reordering, by a database management system, at least a first data vector for a first column of a database table and a second data vector for a second column of the database table using a forward pass followed by a backward pass, wherein the forward pass sorts, based on frequency of occurrence, at least the first data vector and the second data vector, wherein the backward pass prioritizes a selection order of the first data vector or the second data vector based on a number of distinct value identifiers in a rest range of at least one of the first data vector or the second data vector, wherein the backward pass sorts by sorting the rest range in at least the first data vector and the second data vector by at least grouping close value identifiers together, wherein at least one rest range is prevented from being split during the sorting; determining, by the database management system and for at least the first data vector, a prefix part and a non-prefix part; generating, by the database management system, a min-max index for the non-prefix part associated with the first data vector; and storing, for at least the first data vector, a value of the prefix part and a length of the prefix part. 8. The system of claim 7 , wherein the backward pass prioritizes the selection order of the first data vector over the second data vector, in response to the first data vector having a higher quantity of distinct values of value identifiers when compared to the second data vector. 9. The system of claim 7 , wherein the backward pass further comprises grouping close value identifiers together. 10. The system of claim 7 , further comprising: encoding, by the database management system, data of at least the first column and the second column of the database table to generate a plurality of data vectors including the first data vector for the first column and the second data vector for the second column. 11. The system of claim 10 , wherein the encoding comprises dictionary encoding. 12. The system of claim 7 , wherein the forward pass comprises moving-up, by the database management system, a most frequently occurring value in at least one of the first data vector or the second data vector. 13. A non-transitory computer-readable medium storing instructions that, when executed by a computer, cause operations comprising: reordering, by a database management system, at least a first data vector for a first column of a database table and a second data vector for a second column of the database table using a forward pass followed by a backward pass, wherein the forward pass sorts, based on frequency of occurrence, at least the first data vector and the second data vector, wherein the backward pass prioritizes a selection order of the first data vector or the second data vector based on a number of distinct value identifiers in a rest range of at least one of the first data vector or the second data vector, wherein the backward pass sorts by sorting the rest range in at least the first data vector and the second data vector by at least grouping close value identifiers together, wherein at least one rest range is prevented from being split during the sorting; determining, by the database management system and for at least the first data vector, a prefix part and a non-prefix part; generating, by the database management system, a min-max index for the non-prefix part associated with the first data vector; and storing, for at least the first data vector, a value of the prefix part and a length of the prefix part. 14. The non-transitory computer-readable medium of claim 13 , wherein the backward pass prioritizes the selection order of the first data vector over the second data vector, in response to the first data vector having a higher quantity of distinct values of value identifiers when compared to the second data vector. 15. The non-transitory computer-readable medium of claim 13 , wherein the backward pass further comprises grouping close value identifiers together. 16. The non-transitory computer-readable medium of claim 13 , further comprising: encoding, by the database management system, data of at least the first column and the second column of the database table to generate a plurality of data vectors including the first data vector for the first column and the second data vector for the second column. 17. The non-transitory computer-readable medium of claim 16 , wherein the encoding comprises dictionary encoding. 18. The non-transitory computer-readable medium of claim 13 , wherein the forward pass comprises moving-up, by the database management system, a most frequently occurring value in at least one of the first data vector or the second data vector.

Assignees

Inventors

Classifications

  • Vectors, bitmaps or matrices · CPC title

  • Combined merging and sorting · CPC title

  • Aggregation; Duplicate elimination · CPC title

  • Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title

  • Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12204517B2 cover?
A database management system is described that can encode data to generate a plurality of data vectors. The database management system can perform the encoding by using a dictionary. The database management system can adaptively reorder the plurality of data vectors to prepare for compression of the plurality of data vectors. During a forward pass of the adaptive reordering, most frequent value…
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).