Deduplication database without reference counting

US12007967B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12007967-B2
Application numberUS-202217725451-A
CountryUS
Kind codeB2
Filing dateApr 20, 2022
Priority dateJul 19, 2019
Publication dateJun 11, 2024
Grant dateJun 11, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to update the deduplication database and remove records corresponding to data blocks that have been or will be erased from the secondary copies, without using or tracking reference counting values. Some embodiments described herein use a secondary table to identify the corresponding records from the primary table that can be removed and/or moved to another table for storing “zero-reference” data blocks. In other embodiments, the system will then traverse the “zero-reference” table and remove those primary data blocks from secondary storage devices.

First claim

Opening claim text (preview).

What is claimed is: 1. An information management system configured to update a deduplication database, the information management system comprising: one or more processors configured to: receive a request to add a record to a primary table within a deduplication database (DDB), wherein the primary table comprises a plurality of primary records containing information about deduplicated data blocks stored in a plurality of secondary storage devices, wherein, each of the plurality of primary records in the primary table comprises, at least: a primary record identifier, a unique signature for a corresponding data block, and a location of the corresponding data block within one of the plurality of secondary storage devices, wherein each of the plurality of primary records does not comprise a field for tracking number of times the corresponding data block is referenced in the plurality of secondary storage devices; upon completion of a secondary storage job, update a secondary table, wherein the secondary table comprises one or more job records, wherein each job record comprises at least one or more job identifiers and at least one or more primary record identifiers; receive or identify a request to update the primary table; and in response to the request to update the primary table, using the secondary table and a bitmap, remove at least one of the plurality of primary records not referenced in the one or more job records of the secondary table. 2. The information management system of claim 1 , wherein the deduplication database is a database partition. 3. The information management system of claim 1 , wherein the secondary table is stored within the DDB. 4. The information management system of claim 1 , wherein the one or more processors is further configured to: traverse all job records in the secondary table. 5. The information management system of claim 1 , wherein the request to update the primary table is initiated by a media agent implemented on same computing device where the DDB is stored. 6. The information management system of claim 1 , wherein the request to update the primary table is performed as part of a pruning operation. 7. The information management system of claim 1 , wherein the one or more processors is further configured to: move the at least one of the plurality of primary records not referenced in the one or more job records to a zero-reference table, wherein the zero-reference table comprises of primary records of data blocks that may be removed from secondary storage devices. 8. The information management system of claim 7 , wherein the one or more processors is further configured to: retrieve the zero-reference table; determine location of a secondary storage device where a first zero-reference data block in the zero-reference table is stored; and remove from the secondary storage device, the first zero-reference data block. 9. The information management system of claim 1 , wherein the one or more processors is further configured to: for each primary record identifier identified in the primary table, query the secondary table for presence of that primary record identifier. 10. The information management system of claim 1 , wherein the deduplicated data blocks referenced in the DDB are stored in multiple single instance files (SFiles). 11. A computer-implemented method for updating a deduplication database, the computer-implemented method comprising: receiving a request to add a record to a primary table within a deduplication database (DDB), wherein the primary table comprises a plurality of primary records containing information about deduplicated data blocks stored in a plurality of secondary storage devices, wherein, each of the plurality of primary records in the primary table comprises, at least: a primary record identifier, a unique signature for a corresponding data block, and a location of the corresponding data block within one of the plurality of secondary storage devices, wherein each of the plurality of primary records does not comprise a field for tracking number of times the corresponding data block is referenced in the plurality of secondary storage devices; upon completion of a secondary storage job, updating a secondary table, wherein the secondary table comprises one or more job records, wherein each job record comprises at least one or more job identifiers and at least one or more primary record identifiers; receiving or identifying a request to update the primary table; and in response to the request to update the primary table, using the secondary table and bitmap, removing at least one of the plurality of primary records not referenced in the one or more job records of the secondary table. 12. The computer-implemented method of claim 11 , wherein the deduplication database is a database partition. 13. The computer-implemented method of claim 11 , wherein the secondary table is stored within the DDB. 14. The computer-implemented method of claim 11 , further comprising: traversing all job records in the secondary table. 15. The computer-implemented method of claim 11 , wherein the request to update the primary table is initiated by a media agent implemented on same computing device where the DDB is stored. 16. The computer-implemented method of claim 11 , wherein the request to update the primary table is performed as part of a pruning operation. 17. The computer-implemented method of claim 11 , further comprising: moving the at least one of the plurality of primary records not referenced in the one or more job records to a zero-reference table, wherein the zero-reference table comprises of primary records of data blocks that may be removed from secondary storage devices. 18. The computer-implemented method of claim 17 , further comprising: retrieving the zero-reference table; determining location of a secondary storage device where a first zero-reference data block in the zero-reference table is stored; and removing from the secondary storage device, the first zero-reference data block. 19. The computer-implemented method of claim 11 , further comprising: for each primary record identifier identified in the primary table, querying the secondary table for presence of that primary record identifier. 20. The computer-implemented method of claim 11 , wherein the deduplicated data blocks referenced in the DDB are stored in multiple single instance files (SFiles). 21. An information management system configured to update a deduplication database, the information management system comprising: one or more processors configured to: receive a request to add a record to a primary table within a deduplication database (DDB), wherein the primary table comprises a plurality of primary records containing information about deduplicated data blocks stored in a plurality of secondary storage devices, wherein, each of the plurality of primary records in the primary table comprises, at least: a primary record identifier, a unique signature for a corresponding data block, and a location of the corresponding data block within one of the plurality of secondary storage devices, wherein each of the plurality of primary records does not comprise a field for tracking number of times the corresponding data block is referenced in the plurality of secondary storage devices; upon completion of a secondary storage job, update a secondary table, wherein the secondary table comprises one or more job records, wherein each j

Assignees

Inventors

Classifications

  • Vectors, bitmaps or matrices · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12007967B2 cover?
A deduplicated storage system is provided according to certain embodiments that uses one or more mechanisms to update the deduplication database and remove records corresponding to data blocks that have been or will be erased from the secondary copies, without using or tracking reference counting values. Some embodiments described herein use a secondary table to identify the corresponding recor…
Who is the assignee on this patent?
Commvault Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).