Tracking utilization of data blocks in a storage system

US2022414102A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022414102-A1
Application numberUS-202117361666-A
CountryUS
Kind codeA1
Filing dateJun 29, 2021
Priority dateJun 29, 2021
Publication dateDec 29, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A storage control system manages a utilization of data blocks of a storage volume which is partitioned into data blocks having a unique block identifier (ID) and a same block size. The storage control system receives data items and assigns a respective unique data ID to each data item, which include consecutive data IDs. The data items are written to a free data block as a whole, and a record for the written data block is inserted into a node of a first tree structure. The record includes the unique block ID of the written data block, a first data ID of the data items, and a bitmap which maps the consecutive data IDs of the data items in the written data block, starting from the first data ID, to a respective bit whose value indicates whether the data item associated with the data ID is valid or invalid.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: managing, by a storage control system, a utilization of data blocks of a storage volume, wherein at least a portion of the storage volume is partitioned into data blocks, wherein each data block comprises a unique block identifier (ID) and has a same block size, wherein managing the utilization of data blocks of the storage volume comprises: receiving data items to be stored in the storage volume; assigning a respective unique data ID to each data item, wherein the assigned data IDs comprise consecutive data ID values; writing the data items to a free data block in the storage volume such that the free data block is written as a whole data block; and inserting a record for the written data block into a node of a first tree data structure, wherein the record comprises (i) the unique block ID of the written data block, (ii) a first data ID of the consecutive ID values of the data items in the written data block, and (iii) a bitmap which maps the consecutive data ID values of the data items in the written data block, starting from the first data ID, to a respective bit whose value is set to indicate whether the data item associated with the data ID is valid or invalid. 2 . The method of claim 1 , wherein the first tree data structure comprises a B+ tree data structure having nodes that are sorted according to the block IDs of used data blocks in the storage system. 3 . The method of claim 1 , wherein: receiving the data items to be stored in the storage volume comprises storing the received data items in a cache memory; and writing the data items to the free data block in the storage volume comprises writing the cached data items to the free data block when a total size of the cached data items in the cache memory accumulates to the size of the free data block. 4 . The method of claim 1 , wherein when a given data item written to the free data block comprises an updated data item of an existing data item stored in a given data block of the storage volume, managing the utilization of data blocks of the storage volume further comprises: invalidating the existing data item; and inserting a record for the invalidated data item in a second tree data structure, wherein the record comprises (i) the unique block ID assigned to the given data block, and (ii) a unique data ID assigned to the invalidated data item; wherein the second tree data structure is configured to provide an index of records associated with invalidated and deleted data items. 5 . The method of claim 4 , wherein the second tree data structure comprises a log-structured merge (LSM) tree data structure. 6 . The method of claim 4 , wherein managing the utilization of data blocks of the storage volume further comprises: deleting a data item stored in a data block in the storage volume; and inserting a record for the deleted data item in the second tree data structure, wherein the record comprises (i) the unique block ID assigned to the data block which contains the deleted data item, and (ii) a unique data ID assigned to the deleted data item. 7 . The method of claim 4 , wherein managing the utilization of data blocks of the storage volume further comprises: accessing a plurality of records in the second tree data structure for a given data block in the storage volume, wherein the plurality of records are associated with data items of the given data block which have been invalidated or deleted; and updating the bitmap of the associated record for the given data block in the first tree data structure using the accessed records of the given data block in the second tree data structure. 8 . The method of claim 4 , wherein managing the utilization of data blocks of the storage volume further comprises: performing a garbage collection process to reclaim a selected data block in the storage volume, wherein performing the garbage collection process comprises: searching the first tree data structure using the block ID of the selected data block to access a record associated with the selected data block; utilizing the bitmap of the accessed record to determine which data items in the selected data block are valid data items; moving the valid data items of the selected data block to a free data block of the storage volume; and reclaiming the selected data block as a free data block for reuse. 9 . The method of claim 8 , wherein performing the garbage collection process further comprises searching records in the second tree data structure using the block ID of the selected data block to determine if any of the data items determined to be valid as a result of the search of the first tree data structure, have been invalidated or deleted subsequent to a last update of the bitmap associated with the selected data block. 10 . An article of manufacture comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code is executable by one or more processors to implement a method which comprises: managing, by a storage control system, a utilization of data blocks of a storage volume, wherein at least a portion of the storage volume is partitioned into data blocks, wherein each data block comprises a unique block identifier (ID) and has a same block size, wherein managing the utilization of data blocks of the storage volume comprises: receiving data items to be stored in the storage volume; assigning a respective unique data ID to each data item, wherein the assigned data IDs comprise consecutive data ID values; writing the data items to a free data block in the storage volume such that the free data block is written as a whole data block; and inserting a record for the written data block into a node of a first tree data structure, wherein the record comprises (i) the unique block ID of the written data block, (ii) a first data ID of the consecutive ID values of the data items in the written data block, and (iii) a bitmap which maps the consecutive data ID values of the data items in the written data block, starting from the first data ID, to a respective bit whose value is set to indicate whether the data item associated with the data ID is valid or invalid. 11 . The article of manufacture of claim 10 , wherein the first tree data structure comprises a B+ tree data structure having nodes that are sorted according to the block IDs of used data blocks in the storage system. 12 . The article of manufacture of claim 10 , wherein: the program code for receiving the data items to be stored in the storage volume comprises program code for storing the received data items in a cache memory; and the program code for writing the data items to the free data block in the storage volume comprises program code for writing the cached data items to the free data block when a total size of the cached data items in the cache memory accumulates to the size of the free data block. 13 . The article of manufacture of claim 10 , wherein when a given data item written to the free data block comprises an updated data item of an existing data item stored in a given data block of the storage volume, the program code for managing the utilization of data blocks of the storage volume further comprises program code for: invalidating the existing data item; and inserting a record for the invalidated data item in a second tree data structure, wherein the record comprises (i) the unique block ID assigned to the given data block, and (ii) a unique data ID assigned to the invalidated data item; wherein the second tree data structure is configured to provide an index

Assignees

Inventors

Classifications

  • Database cache management · CPC title

  • Trees, e.g. B+trees · CPC title

  • Garbage collection, i.e. reclamation of unreferenced memory · CPC title

  • Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022414102A1 cover?
A storage control system manages a utilization of data blocks of a storage volume which is partitioned into data blocks having a unique block identifier (ID) and a same block size. The storage control system receives data items and assigns a respective unique data ID to each data item, which include consecutive data IDs. The data items are written to a free data block as a whole, and a record f…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/24552. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).