Generating compressed representations of sorted arrays of identifiers

US11720251B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11720251-B2
Application numberUS-201916518562-A
CountryUS
Kind codeB2
Filing dateJul 22, 2019
Priority dateJul 22, 2019
Publication dateAug 8, 2023
Grant dateAug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes obtaining an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system, determining a segment size for splitting elements of the array into a plurality of segments, splitting the array into the plurality of segments based at least in part on the determined segment size, and compressing the plurality of segments to create a plurality of compressed segments. The method also includes generating a balanced binary search tree comprising a plurality of nodes each identifying a range of elements of the array corresponding to a given one of the segments and comprising a pointer to a given compressed segment corresponding to the given segment. The method further includes maintaining the balanced binary search tree and the compressed segments in the designated portion of the memory, and processing queries to the array utilizing the balanced binary search tree.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system, wherein the array of sorted identifiers comprises a monotonic sequence of unique identifiers; determining a segment size for splitting elements of the array of sorted identifiers into a plurality of segments; splitting the array of sorted identifiers into the plurality of segments based at least in part on the determined segment size, the plurality of segments comprising respective ranges of identifiers in the array of sorted identifiers; compressing the plurality of segments to create a plurality of compressed segments; generating a balanced binary search tree comprising a plurality of nodes, each of at least a subset of the plurality of nodes (i) identifying a range of elements of the array of sorted identifiers corresponding to a given one of the segments and (ii) comprising a pointer to a given one of the compressed segments corresponding to the given segment; generating an in-memory data structure comprising the balanced binary search tree and the plurality of compressed segments; maintaining the generated in-memory data structure in the designated portion of the memory of the computing system in place of at least a portion of the array of sorted identifiers; and processing one or more queries to the array of sorted identifiers utilizing the plurality of nodes of the balanced binary search tree; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. 2. The method of claim 1 wherein the monotonic sequence of unique identifiers comprises a monotonically increasing sequence of unique identifiers. 3. The method of claim 2 wherein the monotonically increasing sequence of unique identifiers comprises integer values. 4. The method of claim 3 wherein the integer values comprise 64 -bit integer values. 5. The method of claim 1 wherein elements in the array of sorted identifiers are associated with at least one of network sessions of one or more assets of an enterprise system and log events in the enterprise system. 6. The method of claim 1 wherein elements in the array of sorted identifiers comprise at least one of database indexes and N-gram indexes. 7. The method of claim 1 wherein the balanced binary search tree provides logarithmic access time to elements in the array of sorted identifiers. 8. The method of claim 7 wherein the balanced binary search tree comprises one of a Red-Black tree and a self-balanced Adelson-Velsky and Landis (AVL) tree. 9. The method of claim 7 wherein a compression algorithm utilized to compress the plurality of segments is linear in a size of a given segment K, where K=O(log(N)) and N denotes a size of the array of sorted identifiers. 10. The method of claim 9 wherein obtaining the array of sorted identifiers comprises performing a streaming construction of the array of sorted identifiers where the size N of the array of sorted identifiers is unknown, and wherein N is estimated based on a difference between a first and an expected last element of the array of sorted identifiers. 11. The method of claim 9 further comprising providing a set of buffers in the memory of the computing device to permit concurrent access to two or more threads running on the computing system for reading and decompressing two or more different compressed segments of the array of sorted identifiers, each of the set of buffers having a size of M*K, where M is a compression ratio of the compression algorithm utilized to compress the plurality of segments. 12. The method of claim 9 wherein a root node of the balanced binary search tree comprises a reference to a current maximum element of the balanced binary search tree to provide constant time streaming append of a new element in the array of sorted identifiers to an existing partially filled segment of the array of sorted identifiers comprising the current maximum element of the balanced binary search tree. 13. A method comprising: obtaining an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system; determining a segment size for splitting elements of the array of sorted identifiers into a plurality of segments; splitting the array of sorted identifiers into the plurality of segments based at least in part on the determined segment size; compressing the plurality of segments to create a plurality of compressed segments; generating a balanced binary search tree comprising a plurality of nodes, each of at least a subset of the plurality of nodes (i) identifying a range of elements of the array of sorted identifiers corresponding to a given one of the segments and (ii) comprising a pointer to a given one of the compressed segments corresponding to the given segment; maintaining the balanced binary search tree and the plurality of compressed segments in the designated portion of the memory of the computing system; processing one or more queries to the array of sorted identifiers utilizing the plurality of nodes of the balanced binary search tree; monitoring access patterns to the plurality of segments of the array of sorted identifiers; and storing in the designated portion of the memory a cache of one or more of the plurality of segments of the array of sorted identifiers in decompressed form, the one or more decompressed segments stored in the cache being selected based on the monitored access patterns; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. 14. The method of claim 13 wherein monitoring the access patterns comprises maintaining a counter of usage of each of the decompressed segments stored in the cache, and evicting decompressed segments stored in the cache based at least in part on the maintained usage counters. 15. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device: to obtain an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system, wherein the array of sorted identifiers comprises a monotonic sequence of unique identifiers; to determine a segment size for splitting elements of the array of sorted identifiers into a plurality of segments; to split the array of sorted identifiers into the plurality of segments based at least in part on the determined segment size, the plurality of segments comprising respective ranges of identifiers in the array of sorted identifiers; to compress the plurality of segments to create a plurality of compressed segments; to generate a balanced binary search tree comprising a plurality of nodes, each of at least a subset of the plurality of nodes (i) identifying a range of elements of the array of sorted identifiers corresponding to a given one of the segments and (ii) comprising a pointer to a given one of the compressed segments corresponding to the given segment; to generate an in-memory data structure comprising the balanced binary search tree and the plurality of compressed segments; to maintain the generated in-memory data structure in the designated portion of the memory of the computing system in place of at least a portion of the array of sorted identifiers; and to process one or more queries to the array of sorted identifiers utilizing the plurality of nodes of the balanced binary search tree.

Assignees

Inventors

Classifications

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • Management of blocks · CPC title

  • Single storage device · CPC title

  • Tree-organised classifiers · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11720251B2 cover?
A method includes obtaining an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system, determining a segment size for splitting elements of the array into a plurality of segments, splitting the array into the plurality of segments based at least in part on the determined segment size, and compressing the plurality of segments to create a plurali…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).