Enhancing data processing performance by cache management of fingerprint index
US-9110815-B2 · Aug 18, 2015 · US
US9632720B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9632720-B2 |
| Application number | US-201414336799-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 21, 2014 |
| Priority date | Aug 29, 2013 |
| Publication date | Apr 25, 2017 |
| Grant date | Apr 25, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and device for data de-duplication, comprising: performing data chunk partition on a current data object by using a different standard in each of a plurality of logical passes; searching one or more first redundant data chunks of the current data object in each logic pass based on the data chunks partitioned on the current data object in the logical pass, respectively, and performing data de-duplication on the current data object based on all of the found first redundant data chunks of the current data object. Other embodiments of the present invention may also relate to a data de-duplication system and a corresponding computer program product.
Opening claim text (preview).
What is claimed is: 1. A method for data de-duplication when processing a plurality of data objects, being executed on a hardware device, comprising: performing data chunk partition during each of a plurality of logical passes of a current data object of the plurality of data objects, where the data chunk partition is performed using a respective different standard in each logical pass of the plurality of logical passes; searching, during the each logical pass of the plurality of logical passes, to find one or more first redundant data chunks of the current data object based on data chunks partitioned on a data object previously processed using the respective different standard; and performing data de-duplication on the current data object based on all of the found first redundant data chunks of the current data object. 2. The method according to claim 1 , wherein performing data chunk partition comprises at least one of the following: performing data chunk partition on the current data object with a fingerprint algorithm by using different fingerprint masks in the respective logical passes; performing data chunk partition on the current data object with a fixed length algorithm by using different data chunk lengths in the respective logical passes; and performing data chunk partition on the current data object by using different partition algorithms in the respective logical passes. 3. The method according to claim 1 , wherein searching, during the each of the plurality of logical passes, to find one or more first redundant data chunks of the current data object comprises: searching, in each logical pass, first redundant data chunks of the current data object based on data chunks partitioned on a previous data object by using the standard of the logical pass and data chunks partitioned on the current data object by using the standard of the logical pass. 4. The method according to claim 1 , wherein performing de-duplication on the current data object comprises: eliminating overlap portions existing between two or more first redundant data chunks found in each logical pass of the plurality of logical passes, based on offset and length of the first redundant data chunks; performing data de-duplication on the current data object through deleting second data redundant chunks, wherein the second redundant data chunks include first redundant data chunks with the overlap portions being deleted. 5. The method according to claim 4 , wherein eliminating overlap portions existing between two or more first redundant data chunks comprises: sorting the first redundant data chunks according to offset of the first redundant data chunks; and merging two or more first redundant data chunks having overlap portions based on the sorted first redundant data chunks and according to length of the first redundant data chunks, so as to determine the second redundant data chunks of the current data object. 6. The method according to claim 5 , further comprising: recovering the current data object according to a link stored for the second redundant data chunks. 7. The method according to claim 4 , wherein the deleted second data redundant chunks are a plurality of discontinuous data chunks in a file. 8. The method according to claim 7 , wherein a different fingerprint mask is used for each respective logical pass of the plurality of passes when performing the data chunk partition. 9. A computer program product comprising program code stored on a non-transitory computer readable storage medium that is configured to perform the method of claim 1 when executed by a data processing apparatus. 10. The method according to claim 1 , wherein different data chunk distributions for a same data object are obtained in each of the plurality of logical passes. 11. A system for data de-duplication when processing a plurality of data objects, being executed on a hardware device, comprising: a memory; a data chunk partition unit configured to perform data chunk partition during each of a plurality of logical passes of a current data object of the plurality of data objects, where the data chunk partition is performed using a respective different standard in each logical pass of the plurality of logical passes; a first redundant data chunk determining unit configured to search, during the each logical pass of the plurality of logical passes, to find one or more first redundant data chunks of the current data object based on data chunks partitioned on a data object previously processed using the respective different standard; and a data de-duplication unit configured to perform data de-duplication on the current data object based on all of the found first redundant data chunks of the current data object. 12. The system according to claim 11 , wherein the data chunk partition unit is configured to perform at least one of the following: performing data chunk partition on the current data object with a fingerprint algorithm by using different fingerprint masks in the respective logical passes; performing data chunk partition on the current data object with a fixed length algorithm by using different data chunk lengths in the respective logical passes; and performing data chunk partition on the current data object by using different partition algorithms in the respective logical passes. 13. The system according to claim 11 , wherein the first redundant data chunk determining unit is configured to: search, in each logical pass, first redundant data chunks of the current data object based on data chunks partitioned on a previous data object by using the standard of the logical pass and data chunks partitioned on the current data object by using the standard of the logical pass. 14. The system according to claim 11 , wherein the data de-duplication unit further comprises: an overlap portion eliminating unit configured to eliminate overlap portions existing between two or more first redundant data chunks found in each logical pass of the plurality of logical passes, based on offset and length of the first redundant data chunks; wherein the data de-duplication unit is configured to perform data de-duplication on the current data object through deleting second redundant data chunks, wherein the second redundant data chunks include the first redundant data chunks with the overlap portions being eliminated. 15. The system according to claim 14 , wherein the overlap portion eliminating unit further comprises: a sorting unit configured to sort the first redundant data chunks according to offset of the first redundant data chunks; and a merging unit configured to merge two or more first redundant data chunks having overlap portions based on the sorted first redundant data chunks and according to length of the first redundant data chunks, so as to determine the second redundant data chunks of the current data object. 16. The system according to claim 15 , further comprising: a recovering unit configured to recover the current data object according to a link stored for the second redundant data chunks. 17. The system according to claim 14 , wherein the deleted second data redundant chunks are a plurality of discontinuous data chunks in a file. 18. The system according to claim 17 , wherein a different fingerprint mask is used for each respective logical pass of the plurality of passes when performing the data chunk partition. 19. The system according to claim 11 , wherein different data chunk distributions for a same data object are obtained in each of the plurality
De-duplication techniques · CPC title
Disk device · CPC title
Saving storage space on storage systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.