Optimizing database deduplication
US-2016147797-A1 · May 26, 2016 · US
US10282124B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10282124-B2 |
| Application number | US-201615190721-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 23, 2016 |
| Priority date | Jun 23, 2016 |
| Publication date | May 7, 2019 |
| Grant date | May 7, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A mechanism is provided for opportunistic handling of freed data in data de-duplication. Responsive to receiving a request to store a file in a storage device, the file is mapped to a set of virtual blocks. For each virtual block in the set of virtual blocks: a hash value is computed, a determination is made as to whether the computed hash value appears within a previously-used information table as associated with an existing data block, and, responsive to the computed hash value appearing within a previously-used information table as associated with an existing data block, a data block entry and hash value associated with the existing data block is moved to a de-duplication information table. The virtual block is then stored as a reference to the existing data block.
Opening claim text (preview).
What is claimed is: 1. A method, in a data processing system, for opportunistic handling of freed data in data de-duplication, the method comprising: responsive to receiving a request to store a file in a storage device, mapping, by a block mapper of the data processing system, the file to a set of virtual blocks; and for each virtual block in the set of virtual blocks: computing, by a de-duplication engine of the data processing system, a hash value; determining, by the de-duplication engine, whether the computed hash value appears within a previously-used information table as associated with an existing data block; responsive to the computed hash value appearing within a previously-used information table in the data processing system as associated with an existing data block, moving, by the de-duplication engine, a data block entry and hash value associated with the existing data block to a de-duplication information table in the data processing system; and storing, by the de-duplication engine, the virtual block in a virtual block referring column of the de-duplication information table as a reference to the existing data block. 2. The method of claim 1 , further comprising: for each virtual block in the set of virtual blocks: determining, by the de-duplication engine, whether the computed hash value appears within a de-duplication information table as associated with an existing data block; and responsive to the computed hash value appearing within a de-duplication information table as associated with an existing data block, storing, by the de-duplication engine, the virtual block in the virtual block referring column of the de-duplication information table as a reference to the existing data block. 3. The method of claim 2 , further comprising: for each virtual block in the set of virtual blocks: responsive to the computed hash value failing to appear within the previously-used information table as associated with an existing data block or within the de-duplication information table as associated with an existing data block, storing, by the de-duplication engine, the hash value in a hash value column of the de-duplication information table with a free data block; storing, by the d-duplication engine, the virtual block in the virtual block referring column of the de-duplication information table as a reference to the free data block; and changing, by the de-duplication engine, the status of the free data block to active. 4. The method of claim 1 , further comprising: responsive to receiving a request to delete a file in the storage device, identifying, by the de-duplication engine, a set of virtual blocks associated with the file to be deleted; and for each virtual block in the set of virtual blocks: determining, by the de-duplication engine, whether the virtual block referring column in the de-duplication information table associated with the data block comprises more than one virtual block entry; responsive to the virtual block referring column in the de-duplication information table associated with the data block comprising more than one virtual block entry, deleting, by the de-duplication engine, the virtual block associated with the file to be deleted from the virtual block referring column of the de-duplication information table; and responsive to the virtual block referring column in the de-duplication information table associated with the data block comprising only one virtual block entry, moving, by the de-duplication engine, the data block entry and associated hash value to the previously-used information table to track previously-used data blocks and deleting, by the de-duplication engine, the virtual block associated with the file to be deleted from the virtual block referring column. 5. The method of claim 1 , further comprising: responsive to receiving a request to clean up the previously-used information table, identifying, by a monitoring mechanism in the data processing system, a set of data blocks in the previously-used information table to be erased one-by-one; for each of the set of identified data blocks one-by-one: erasing, by the monitoring mechanism, the associated data from the storage device; deleting, by the monitoring mechanism, the hash value associated with the data block; adding, by the monitoring mechanism, the data block back to the de-duplication information table indicating the status of the data block as free; determining, by the monitoring mechanism, whether a number of free data blocks in a de-duplication information table is below a predetermined threshold; and responsive to the number of free data blocks in a de-duplication information table remaining below the predetermined threshold, proceeding, by the monitoring mechanism, with a next identified data block until the number of free data blocks in the de-duplication information table is above the predetermined threshold. 6. The method of claim 5 , wherein the request to clean up the previously-used information table is responsive to at least one of the number of free data blocks in a de-duplication information table falling below a predetermined threshold, an administrator triggering a cleanup of the previously-used information table, or a number of previously-used data blocks exceeding another predetermined threshold. 7. The method of claim 5 , wherein the identification of the set of data blocks in the previously-used information table is based on at least one of oldest data blocks, largest data blocks, or data blocks that are not considered important. 8. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: responsive to receiving a request to store a file in a storage device, map, by a block mapper of the computing device, the file to a set of virtual blocks; and for each virtual block in the set of virtual blocks: compute, by a de-duplication engine of the computing device, a hash value; determine, by the de-duplication engine, whether the computed hash value appears within a previously-used information table as associated with an existing data block; responsive to the computed hash value appearing within a previously-used information table in the computing device as associated with an existing data block, move, by the de-duplication engine, a data block entry and hash value associated with the existing data block to a de-duplication information table in the computing device; and store, by the de-duplication engine, the virtual block in a virtual block referring column of the de-duplication information table as a reference to the existing data block. 9. The computer program product of claim 8 , wherein the computer readable program further causes the computing device to: for each virtual block in the set of virtual blocks: determine, by the de-duplication engine, whether the computed hash value appears within a de-duplication information table as associated with an existing data block; and responsive to the computed hash value appearing within a de-duplication information table as associated with an existing data block, store, by the de-duplication engine, the virtual block in the virtual block referring column of the dc-duplication information table as a reference to the existing data block. 10. The computer program product of claim 9 , wherein the computer readable program further causes the computing device to: for each virtual block in the set of virtual blocks: responsive to the computed hash value failing to appear within the previously-used information table as associated with an existin
Improving I/O performance · CPC title
Single storage device · CPC title
De-duplication techniques · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.