Direct lookup for identifying duplicate data in a data deduplication system
US-2017161329-A1 · Jun 8, 2017 · US
US10169365B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10169365-B2 |
| Application number | US-201615059160-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 2, 2016 |
| Priority date | Mar 2, 2016 |
| Publication date | Jan 1, 2019 |
| Grant date | Jan 1, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and computer programs are presented for deduplicating data in a storage device. One method includes an operation for identifying multiple deduplication domains for a storage system. A fingerprint index is created for each deduplication domain, where each data block stored in the storage system is associated with one of the plurality of deduplication domains. The method also includes operations for receiving a first data block the storage system, and for identifying a first deduplication domain from the plurality at of deduplication domains corresponding to the first data block. The first data block is deduplicated within the first deduplication domain utilizing a first fingerprint index associated with the first deduplication domain.
Opening claim text (preview).
What is claimed is: 1. A method comprising: identifying, by a processor, a plurality of deduplication domains for a storage system; creating, by the processor, a fingerprint index for each deduplication domain, wherein each data block stored in the storage system is associated with one of the plurality of deduplication domains; receiving, by the processor, a first data block at the storage system; identifying, by the processor, a first deduplication domain from the plurality of deduplication domains corresponding to the first data block; deduplicating, by the processor, the first data block within the first deduplication domain based on a first fingerprint index associated with the first deduplication domain; determining, by the processor, whether a size of a global fingerprint buffer exceeds a predetermined threshold, the global fingerprint buffer including space for storing fingerprint buffers of the deduplication domains; and based on a determination that the size of the global fingerprint buffer exceeds the predetermined threshold, selecting, by the processor, a deduplication domain of the plurality of deduplication domains and updating the fingerprint index of the selected deduplication domain with fingerprint mappings stored in the fingerprint buffer of the selected deduplication domain. 2. The method as recited in claim 1 , wherein deduplicating the first data block includes: determining a first fingerprint for the first data block; determining whether the first fingerprint is in the first fingerprint index; and storing the first data block in permanent storage based on a determination that the first fingerprint is not in the first fingerprint index. 3. The method as recited in claim 2 , wherein deduplicating the first data block further includes: based on a determination that the first fingerprint is not in the first fingerprint index, adding a fingerprint mapping to a first fingerprint buffer of the first deduplication domain kept in a random access memory (RAM), the fingerprint mapping including the first fingerprint and a location of the first data block in the permanent storage. 4. The method as recited in claim 3 , wherein selecting the deduplication domain for updating the fingerprint index further comprises selecting the deduplication domain with a highest ratio of a size of the fingerprint buffer in the global fingerprint buffer to a size of the fingerprint index. 5. The method as recited in claim 4 , wherein updating the fingerprint index further includes: merging fingerprint mappings in RAM associated with the selected deduplication domain with the fingerprint index to create a new fingerprint index; and freeing from RAM the fingerprint mappings in RAM associated with the selected deduplication domain. 6. The method as recited in claim 1 , wherein deduplicating the first data block further includes: determining a first fingerprint for the first data block; based on a determination that the first fingerprint is in the first fingerprint index, identifying a second data block associated with the first fingerprint in the first fingerprint index; and associating the first data block with the first fingerprint and the second data block, wherein the first data block is not added to the permanent storage. 7. The method as recited in claim 1 , wherein each deduplication domain is defined to track duplicate data blocks within the deduplication domain to store one copy of the duplicate data blocks, wherein data blocks existing in more than one deduplication domain will have a separate copy stored in permanent storage for each of the deduplication domains where the data blocks exist. 8. The method as recited in claim 1 , wherein identifying a first deduplication domain includes: determining a first volume of the storage system that includes the first data block; and determining the first deduplication domain as the deduplication domain associated with the first volume. 9. The method as recited in claim 1 , wherein identifying a plurality of deduplication domains for a storage system includes: providing a user interface for receiving input from an administrator of the storage system, the input identifying one of the deduplication domains for each volume in the storage system. 10. The method as recited in claim 1 , wherein the data blocks for the plurality of deduplication domains are stored intermingled within a permanent storage of the storage system. 11. A storage system comprising: a permanent storage for storing data blocks, wherein each data block stored in the permanent storage is associated with one of a plurality of deduplication domains; a memory for storing a fingerprint index for each deduplication domain; a processor to receive a first data block and identify a first deduplication domain from the plurality of deduplication domains corresponding to the first data block, wherein the processor is to deduplicate the first data block within the first deduplication domain based on a first fingerprint index associated with the first deduplication domain; a random access memory (RAM) for storing a global fingerprint buffer including a first fingerprint buffer on which is stored first fingerprint mappings, wherein the processor is further to: determine whether a size of the global fingerprint buffer exceeds a predetermined threshold; and based on a determination that the size of the global fingerprint buffer exceeds the predetermined threshold, select, by the processor, a deduplication domain of the plurality of deduplication domains and update the fingerprint index of the selected deduplication domain with fingerprint mappings stored in the fingerprint buffer of the selected deduplication domain. 12. The storage system as recited in claim 11 , wherein the processor is to deduplicate the first data block by determining a first fingerprint for the first data block, determining whether the first fingerprint is in the first fingerprint index, and storing the first data block in the permanent storage based on the first fingerprint not being in the first fingerprint index. 13. The storage system as recited in claim 12 , wherein based on the first fingerprint not being in the first fingerprint index, the processor is to add a fingerprint mapping to the global fingerprint buffer, the fingerprint mapping including the first fingerprint and a location of the first data block in the permanent storage. 14. The storage system as recited in claim 12 , wherein based on the first fingerprint not being in the first fingerprint index, the processor is to add a fingerprint mapping to a buffer kept in the memory, the fingerprint mapping including the first fingerprint and a location of the first data block in the permanent storage. 15. The storage system as recited in claim 11 , wherein the fingerprint index includes fingerprints of data blocks stored in the storage system, each fingerprint being mapped to one of the data blocks stored in the storage system. 16. A non-transitory computer-readable storage medium storing a computer program that when executed by a processor, cause the processor to: determine whether a size of a global fingerprint buffer exceeds a predetermined threshold, the global fingerprint buffer including a first fingerprint buffer storing first fingerprint mappings for a first fingerprint index and a second fingerprint buffer storing second fingerprint mappings for a second fingerprint index, the first fingerprint index being associated with a first deduplication domain and the second fingerprint index being associated with a second deduplication domain for
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
Distributed file systems · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.