Lossless binary compression in a memory constrained environment
US-10103747-B1 · Oct 16, 2018 · US
US11720251B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11720251-B2 |
| Application number | US-201916518562-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 22, 2019 |
| Priority date | Jul 22, 2019 |
| Publication date | Aug 8, 2023 |
| Grant date | Aug 8, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes obtaining an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system, determining a segment size for splitting elements of the array into a plurality of segments, splitting the array into the plurality of segments based at least in part on the determined segment size, and compressing the plurality of segments to create a plurality of compressed segments. The method also includes generating a balanced binary search tree comprising a plurality of nodes each identifying a range of elements of the array corresponding to a given one of the segments and comprising a pointer to a given compressed segment corresponding to the given segment. The method further includes maintaining the balanced binary search tree and the compressed segments in the designated portion of the memory, and processing queries to the array utilizing the balanced binary search tree.
Opening claim text (preview).
What is claimed is: 1. A method comprising: obtaining an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system, wherein the array of sorted identifiers comprises a monotonic sequence of unique identifiers; determining a segment size for splitting elements of the array of sorted identifiers into a plurality of segments; splitting the array of sorted identifiers into the plurality of segments based at least in part on the determined segment size, the plurality of segments comprising respective ranges of identifiers in the array of sorted identifiers; compressing the plurality of segments to create a plurality of compressed segments; generating a balanced binary search tree comprising a plurality of nodes, each of at least a subset of the plurality of nodes (i) identifying a range of elements of the array of sorted identifiers corresponding to a given one of the segments and (ii) comprising a pointer to a given one of the compressed segments corresponding to the given segment; generating an in-memory data structure comprising the balanced binary search tree and the plurality of compressed segments; maintaining the generated in-memory data structure in the designated portion of the memory of the computing system in place of at least a portion of the array of sorted identifiers; and processing one or more queries to the array of sorted identifiers utilizing the plurality of nodes of the balanced binary search tree; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. 2. The method of claim 1 wherein the monotonic sequence of unique identifiers comprises a monotonically increasing sequence of unique identifiers. 3. The method of claim 2 wherein the monotonically increasing sequence of unique identifiers comprises integer values. 4. The method of claim 3 wherein the integer values comprise 64 -bit integer values. 5. The method of claim 1 wherein elements in the array of sorted identifiers are associated with at least one of network sessions of one or more assets of an enterprise system and log events in the enterprise system. 6. The method of claim 1 wherein elements in the array of sorted identifiers comprise at least one of database indexes and N-gram indexes. 7. The method of claim 1 wherein the balanced binary search tree provides logarithmic access time to elements in the array of sorted identifiers. 8. The method of claim 7 wherein the balanced binary search tree comprises one of a Red-Black tree and a self-balanced Adelson-Velsky and Landis (AVL) tree. 9. The method of claim 7 wherein a compression algorithm utilized to compress the plurality of segments is linear in a size of a given segment K, where K=O(log(N)) and N denotes a size of the array of sorted identifiers. 10. The method of claim 9 wherein obtaining the array of sorted identifiers comprises performing a streaming construction of the array of sorted identifiers where the size N of the array of sorted identifiers is unknown, and wherein N is estimated based on a difference between a first and an expected last element of the array of sorted identifiers. 11. The method of claim 9 further comprising providing a set of buffers in the memory of the computing device to permit concurrent access to two or more threads running on the computing system for reading and decompressing two or more different compressed segments of the array of sorted identifiers, each of the set of buffers having a size of M*K, where M is a compression ratio of the compression algorithm utilized to compress the plurality of segments. 12. The method of claim 9 wherein a root node of the balanced binary search tree comprises a reference to a current maximum element of the balanced binary search tree to provide constant time streaming append of a new element in the array of sorted identifiers to an existing partially filled segment of the array of sorted identifiers comprising the current maximum element of the balanced binary search tree. 13. A method comprising: obtaining an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system; determining a segment size for splitting elements of the array of sorted identifiers into a plurality of segments; splitting the array of sorted identifiers into the plurality of segments based at least in part on the determined segment size; compressing the plurality of segments to create a plurality of compressed segments; generating a balanced binary search tree comprising a plurality of nodes, each of at least a subset of the plurality of nodes (i) identifying a range of elements of the array of sorted identifiers corresponding to a given one of the segments and (ii) comprising a pointer to a given one of the compressed segments corresponding to the given segment; maintaining the balanced binary search tree and the plurality of compressed segments in the designated portion of the memory of the computing system; processing one or more queries to the array of sorted identifiers utilizing the plurality of nodes of the balanced binary search tree; monitoring access patterns to the plurality of segments of the array of sorted identifiers; and storing in the designated portion of the memory a cache of one or more of the plurality of segments of the array of sorted identifiers in decompressed form, the one or more decompressed segments stored in the cache being selected based on the monitored access patterns; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. 14. The method of claim 13 wherein monitoring the access patterns comprises maintaining a counter of usage of each of the decompressed segments stored in the cache, and evicting decompressed segments stored in the cache based at least in part on the maintained usage counters. 15. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device: to obtain an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system, wherein the array of sorted identifiers comprises a monotonic sequence of unique identifiers; to determine a segment size for splitting elements of the array of sorted identifiers into a plurality of segments; to split the array of sorted identifiers into the plurality of segments based at least in part on the determined segment size, the plurality of segments comprising respective ranges of identifiers in the array of sorted identifiers; to compress the plurality of segments to create a plurality of compressed segments; to generate a balanced binary search tree comprising a plurality of nodes, each of at least a subset of the plurality of nodes (i) identifying a range of elements of the array of sorted identifiers corresponding to a given one of the segments and (ii) comprising a pointer to a given one of the compressed segments corresponding to the given segment; to generate an in-memory data structure comprising the balanced binary search tree and the plurality of compressed segments; to maintain the generated in-memory data structure in the designated portion of the memory of the computing system in place of at least a portion of the array of sorted identifiers; and to process one or more queries to the array of sorted identifiers utilizing the plurality of nodes of the balanced binary search tree.
Saving storage space on storage systems · CPC title
Management of blocks · CPC title
Single storage device · CPC title
Tree-organised classifiers · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.