Opportunistic handling of freed data in data de-duplication

US10282124B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10282124-B2
Application numberUS-201615190721-A
CountryUS
Kind codeB2
Filing dateJun 23, 2016
Priority dateJun 23, 2016
Publication dateMay 7, 2019
Grant dateMay 7, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is provided for opportunistic handling of freed data in data de-duplication. Responsive to receiving a request to store a file in a storage device, the file is mapped to a set of virtual blocks. For each virtual block in the set of virtual blocks: a hash value is computed, a determination is made as to whether the computed hash value appears within a previously-used information table as associated with an existing data block, and, responsive to the computed hash value appearing within a previously-used information table as associated with an existing data block, a data block entry and hash value associated with the existing data block is moved to a de-duplication information table. The virtual block is then stored as a reference to the existing data block.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, in a data processing system, for opportunistic handling of freed data in data de-duplication, the method comprising: responsive to receiving a request to store a file in a storage device, mapping, by a block mapper of the data processing system, the file to a set of virtual blocks; and for each virtual block in the set of virtual blocks: computing, by a de-duplication engine of the data processing system, a hash value; determining, by the de-duplication engine, whether the computed hash value appears within a previously-used information table as associated with an existing data block; responsive to the computed hash value appearing within a previously-used information table in the data processing system as associated with an existing data block, moving, by the de-duplication engine, a data block entry and hash value associated with the existing data block to a de-duplication information table in the data processing system; and storing, by the de-duplication engine, the virtual block in a virtual block referring column of the de-duplication information table as a reference to the existing data block. 2. The method of claim 1 , further comprising: for each virtual block in the set of virtual blocks: determining, by the de-duplication engine, whether the computed hash value appears within a de-duplication information table as associated with an existing data block; and responsive to the computed hash value appearing within a de-duplication information table as associated with an existing data block, storing, by the de-duplication engine, the virtual block in the virtual block referring column of the de-duplication information table as a reference to the existing data block. 3. The method of claim 2 , further comprising: for each virtual block in the set of virtual blocks: responsive to the computed hash value failing to appear within the previously-used information table as associated with an existing data block or within the de-duplication information table as associated with an existing data block, storing, by the de-duplication engine, the hash value in a hash value column of the de-duplication information table with a free data block; storing, by the d-duplication engine, the virtual block in the virtual block referring column of the de-duplication information table as a reference to the free data block; and changing, by the de-duplication engine, the status of the free data block to active. 4. The method of claim 1 , further comprising: responsive to receiving a request to delete a file in the storage device, identifying, by the de-duplication engine, a set of virtual blocks associated with the file to be deleted; and for each virtual block in the set of virtual blocks: determining, by the de-duplication engine, whether the virtual block referring column in the de-duplication information table associated with the data block comprises more than one virtual block entry; responsive to the virtual block referring column in the de-duplication information table associated with the data block comprising more than one virtual block entry, deleting, by the de-duplication engine, the virtual block associated with the file to be deleted from the virtual block referring column of the de-duplication information table; and responsive to the virtual block referring column in the de-duplication information table associated with the data block comprising only one virtual block entry, moving, by the de-duplication engine, the data block entry and associated hash value to the previously-used information table to track previously-used data blocks and deleting, by the de-duplication engine, the virtual block associated with the file to be deleted from the virtual block referring column. 5. The method of claim 1 , further comprising: responsive to receiving a request to clean up the previously-used information table, identifying, by a monitoring mechanism in the data processing system, a set of data blocks in the previously-used information table to be erased one-by-one; for each of the set of identified data blocks one-by-one: erasing, by the monitoring mechanism, the associated data from the storage device; deleting, by the monitoring mechanism, the hash value associated with the data block; adding, by the monitoring mechanism, the data block back to the de-duplication information table indicating the status of the data block as free; determining, by the monitoring mechanism, whether a number of free data blocks in a de-duplication information table is below a predetermined threshold; and responsive to the number of free data blocks in a de-duplication information table remaining below the predetermined threshold, proceeding, by the monitoring mechanism, with a next identified data block until the number of free data blocks in the de-duplication information table is above the predetermined threshold. 6. The method of claim 5 , wherein the request to clean up the previously-used information table is responsive to at least one of the number of free data blocks in a de-duplication information table falling below a predetermined threshold, an administrator triggering a cleanup of the previously-used information table, or a number of previously-used data blocks exceeding another predetermined threshold. 7. The method of claim 5 , wherein the identification of the set of data blocks in the previously-used information table is based on at least one of oldest data blocks, largest data blocks, or data blocks that are not considered important. 8. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: responsive to receiving a request to store a file in a storage device, map, by a block mapper of the computing device, the file to a set of virtual blocks; and for each virtual block in the set of virtual blocks: compute, by a de-duplication engine of the computing device, a hash value; determine, by the de-duplication engine, whether the computed hash value appears within a previously-used information table as associated with an existing data block; responsive to the computed hash value appearing within a previously-used information table in the computing device as associated with an existing data block, move, by the de-duplication engine, a data block entry and hash value associated with the existing data block to a de-duplication information table in the computing device; and store, by the de-duplication engine, the virtual block in a virtual block referring column of the de-duplication information table as a reference to the existing data block. 9. The computer program product of claim 8 , wherein the computer readable program further causes the computing device to: for each virtual block in the set of virtual blocks: determine, by the de-duplication engine, whether the computed hash value appears within a de-duplication information table as associated with an existing data block; and responsive to the computed hash value appearing within a de-duplication information table as associated with an existing data block, store, by the de-duplication engine, the virtual block in the virtual block referring column of the dc-duplication information table as a reference to the existing data block. 10. The computer program product of claim 9 , wherein the computer readable program further causes the computing device to: for each virtual block in the set of virtual blocks: responsive to the computed hash value failing to appear within the previously-used information table as associated with an existin

Assignees

Inventors

Classifications

  • Improving I/O performance · CPC title

  • Single storage device · CPC title

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10282124B2 cover?
A mechanism is provided for opportunistic handling of freed data in data de-duplication. Responsive to receiving a request to store a file in a storage device, the file is mapped to a set of virtual blocks. For each virtual block in the set of virtual blocks: a hash value is computed, a determination is made as to whether the computed hash value appears within a previously-used information tabl…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 07 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).