Optimization of data deduplication
US-2018232140-A1 · Aug 16, 2018 · US
US12436700B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12436700-B2 |
| Application number | US-202217590367-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 1, 2022 |
| Priority date | Oct 25, 2017 |
| Publication date | Oct 7, 2025 |
| Grant date | Oct 7, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A mechanism is provided for dispersed location-based data storage. A request is received to write a data file to a referrer memory region in a set of memory regions. For each data chunk of the data file, responsive to a comparison of a hash value for the data chunk to other hash values for other stored data chunks referenced in the referrer memory region indicating that the data chunk fails to exist in the referrer memory region, responsive to the data chunk existing in another memory region in the set of memory regions, responsive to the memory region failing to be one of the predetermined number N of owner memory regions associated with the referrer memory region, and responsive to the predetermined number N of owner memory regions failing to have been met, a reference to the data chunk is stored in the referrer memory region.
Opening claim text (preview).
What is claimed is: 1. A method, in a data processing system, comprising: configuring a referrer memory region, in a set of memory regions of a data storage system, to have a predetermined maximum number of corresponding owner memory regions in the set of memory regions, wherein the referrer memory region stores a set of references to locations of data chunks of one or more data files stored in the corresponding owner memory regions, and the predetermined maximum number limits a number of the corresponding owner memory regions to which the referrer memory region is permitted to have references in the set of references; receiving, by the data storage system, a request to write a first data file to the referrer memory region; generating, based on the receiving of the request, a hash value for each data chunk of the first data file; comparing the generated hash value, of each data chunk of the first data file, to a set of hash values associated with a set of data chunks, of the data chunks, stored in a subset of owner memory regions associated with the referrer memory region; determining, based on the comparison, that one or more data chunks of the first data file do not exist in the subset of owner memory regions; and based on the determining, and for each data chunk of the one or more data chunks: storing the data chunk in a specific owner memory region different from the subset of owner memory regions; updating a popularity tracking metric for the specific owner memory region based on accessing of the specific owner memory region; adding the specific owner memory region to the subset of owner memory regions based on a first policy and a second policy, wherein the first policy adds the specific owner memory region to the subset of owner memory regions until the predetermined maximum number of corresponding owner memory regions is reached, and the second policy adds the specific owner memory region to the subset of owner memory regions based on each of the updated popularity tracking metric of the specific owner memory region and a predetermined popularity criterion; and storing a reference to the data chunk in the referrer memory region. 2. The method of claim 1 , wherein the adding of the specific owner memory region to the subset of owner memory regions based on the second policy further comprises: determining that a number of owner memory regions in the subset of owner memory regions has not reached the predetermined maximum number of corresponding owner memory regions; determining, by the data storage system, whether the specific owner memory region has met a popularity threshold; and based on the specific owner memory region meeting the popularity threshold, adding, by the data storage system, the specific owner memory region to the subset of owner memory regions. 3. The method of claim 1 , wherein memory regions, from the set of memory regions that comprise the subset of owner memory regions, are added to the subset of owner memory regions up to the predetermined maximum number of corresponding owner memory regions based on a first come, first served policy. 4. The method of claim 3 , wherein under the first come, first served policy, the specific owner memory region is added to the subset of owner memory regions, as the data chunk is stored in the specific owner memory region, in a case where the addition of the specific owner memory region does not exceed the predetermined maximum number of corresponding owner memory regions, and in a case where the addition of the specific owner memory region would exceed the predetermined maximum number of corresponding owner memory regions, the data chunk is written to the referrer memory region instead. 5. The method of claim 1 , wherein under the second policy, the specific owner memory region is added to the subset of owner memory regions based on a determination that the popularity metric of the specific owner memory region meets a predetermined popularity threshold, and the popularity metric comprises a count of a number of reads and/or writes to the specific owner memory region. 6. The method of claim 1 , wherein, under the second policy, a lowest popularity memory region is removed from the subset of owner memory regions when the popularity tracking metric of the specific owner memory region exceeds a popularity tracking metric of the lowest popularity memory region. 7. The method of claim 1 , wherein the second policy is a popularity policy where first owner memory regions of the set of memory regions that are accessed more often than second owner memory regions of the set of memory regions are added to the subset of owner memory regions. 8. The method of claim 1 , wherein the adding of the specific owner memory region comprises replacing an existing owner memory region in the subset of owner memory regions with the specific owner memory region based on the predetermined maximum number of corresponding owner memory regions having been reached, and the predetermined popularity criterion being met by the popularity tracking metric of the specific owner memory region. 9. A computer program product, comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device having a storage system, causes the storage system to: configure a referrer memory region, in a set of memory regions of a data storage system, to have a predetermined maximum number of corresponding owner memory regions in the set of memory regions, wherein the referrer memory region stores a set of references to locations of data chunks of one or more data files stored in the corresponding owner memory regions, and the predetermined maximum number limits a number of the corresponding owner memory regions to which the referrer memory region is permitted to have references in the set of references; receive, by the storage system, a request to write a first data file to the referrer memory region; generate, based on the reception of the request, a hash value for each data chunk of the first data file; compare the generated hash value, of each data chunk of the first data file, to a set of hash values associated with a set of data chunks of the data chunks stored in a subset of owner memory regions associated with the referrer memory region; determine, based on the comparison, that one or more data chunks of the first data file do not exist in the subset of owner memory regions; and based on the determination, and for each data chunk of the one or more data chunks: store the data chunk in a specific owner memory region different from the subset of owner memory regions; update a popularity tracking metric for the specific owner memory region based on accessing of the specific owner memory region; add the specific owner memory region to the subset of owner memory regions based on a first policy and a second policy, wherein the first policy adds the specific owner memory region to the subset of owner memory regions until the predetermined maximum number of corresponding owner memory regions is reached, and the second policy adds the specific owner memory region to the subset of owner memory regions based on each of the updated popularity tracking metric of the specific owner memory region and a predetermined popularity criterion; and store a reference to the data chunk in the referrer memory region. 10. The computer program product of claim 9 , wherein the computer readable program further causes the storage system of the computing device to add the specific owner memory region to the subset of owner memory regions based on the second policy at least by: a dete
based on file chunks · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Aggregation; Duplicate elimination · CPC title
using de-duplication of the data · CPC title
Single storage device · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.