Cooperative data deduplication in a solid state storage array
US-2016179395-A1 · Jun 23, 2016 · US
US2016283372A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016283372-A1 |
| Application number | US-201514670288-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 26, 2015 |
| Priority date | Mar 26, 2015 |
| Publication date | Sep 29, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for extending data lifetime for reference in deduplication is provided. The method includes determining that a quantity of user data has at least a threshold amount of data that is re-created in a storage system. The method includes protecting at least portions of the quantity of user data from erasure by garbage collection in the storage system during a predetermined time interval, wherein the protected at least portions are available for data deduplication of further user data in the storage system during the predetermined time interval.
Opening claim text (preview).
What is claimed is: 1 . A method for extending reference data lifetime in deduplication, comprising: determining that a quantity of user data has at least a threshold amount of data that is re-created in a storage system; and protecting at least portions of the quantity of user data from erasure by garbage collection in the storage system during a predetermined time interval, wherein the protected at least portions are available for deduplication of further user data in the storage system during the predetermined time interval. 2 . The method of claim 1 , wherein the determining comprises: forming a data structure, over one or more sampling windows of time, of data of the storage system, the data structure indicating amounts of the data of files or blocks in the storage system having hash function results matching hash function results of at least one other file or block seen during the one or more sampling windows of time. 3 . The method of claim 1 , wherein the protecting is based on metadata that includes an aging parameter for each of the at least portions of the quantity of user data, and further comprising: setting the aging parameter for one of the at least portions of the quantity of user data to a first value, responsive to determining the one of the at least portions matches a fingerprint result of a file or block seen during a sampling window of time, wherein the first value indicates to not erase during garbage collection; and adjusting the aging parameter for a further one of the at least portions of the quantity of user data to a second value, responsive to determining the further one of the at least portions has a fingerprint result unmatched by files or blocks seen during the sampling window of time, wherein the second value indicates to erase during garbage collection. 4 . The method of claim 1 , further comprising: performing garbage collection, which includes erasing portions of storage memory of the storage system, corresponding to files that the storage system considers no longer in existence, except where metadata prevents the erasure during the garbage collection. 5 . The method of claim 1 , further comprising: performing garbage collection, which includes erasing a plurality of fingerprints not matched in at least one deduplication operation, except where a fingerprint, corresponding to one of the at least portions of the quantity of user data is protected from erasure. 6 . The method of claim 1 , wherein the determining is on a basis of a file, a file type, a block or a range of blocks, and wherein the threshold amount of data is one or more times as much as a data chunk corresponding to a fingerprint or a hash function result. 7 . The method of claim 1 , further comprising: establishing in metadata that at least portions of the quantity of user data are to have erasure immunity in the storage system for the predetermined time interval; and writing to a table in the storage system, regarding how often incoming data to the storage system re-creates a same data or includes greater than the threshold amount of data that is re-created. 8 . A deduplication method comprising: identifying user data having at least a threshold amount of data that is re-created, by one or more applications; writing an indicator to metadata associated with the identified user data, wherein the indicator establishes a time interval of erasure immunity; and preventing at least portions of the user data from being erased during garbage collection, during the time interval in accordance with the indicator, wherein the at least portions of the identified user data are kept available during the time interval of erasure immunity as reference data for deduplication of further data. 9 . The method of claim 8 , further comprising: generating a histogram tracking re-creation of files or blocks over one or more sampling windows of time; and adjusting a value of the threshold amount based on the histogram. 10 . The method of claim 8 , further comprising: setting an aging parameter of the indicator in metadata at a start of the time interval; and adjusting the aging parameter towards allowance of erasure, responsive to one of completing a cycle of garbage collection or a passage of time. 11 . The method of claim 8 , further comprising: tracking one of file creation, file deletion, block creation, block deletion, object creation, object deletion or frequency of re-creation of data, in a metadata table; and adjusting the time interval based on the tracking. 12 . The method of claim 8 , further comprising: adjusting a value of the threshold amount, based on utilization of storage capacity of the storage system. 13 . The method of claim 8 , further comprising: performing garbage collection, to erase portions of storage memory of the storage system except where the indicator establishes the time interval of erasure immunity for the at least portions of the identified user data. 14 . A storage system, comprising: storage memory; and at least one processor of the storage system, configured to write an indicator to metadata of the storage system, the indicator protecting at least portions of user data from erasure during garbage collection, for a time interval, responsive to the at least one processor identifying the user data as having at least a threshold amount of data that is re-created on a periodic basis, wherein the at least portions of the user data associated with the indicator are available during the time interval as a reference for deduplication. 15 . The storage system of claim 14 , further comprising: the at least one processor configured to generate a data structure of user data over time, the data structure indicating at least one of: frequency of re-creation of data or amount of re-created data, wherein the identifying is based on the data structure. 16 . The storage system of claim 14 , further comprising: the at least one processor configured to monitor amount of storage capacity of the storage memory utilized and to adjust the time interval or a value of the threshold amount based on the amount of storage capacity utilized. 17 . The storage system of claim 14 , further comprising: the indicator including an aging parameter; the at least one processor configured to adjust the aging parameter towards a first value, responsive to the identifying, wherein the first value prevents the erasure during garbage collection; and the at least one processor configured to adjust the aging parameter towards a second value, responsive to one of passage of time or completion of a garbage collection cycle, wherein the second value allows the erasure during garbage collection. 18 . The storage system of claim 14 , further comprising: the at least one processor configured to perform in-line deduplication on data arriving for storage in the storage memory, with reference to the at least portions of the user data associated with the indicator. 19 . The storage system of claim 14 , further comprising: the at least one processor configured to perform post-process deduplication on data stored in the storage memory, with reference to the at least portions of the user data associated with the indicator. 20 . The storage system of claim 14 , further comprising: the at least one processor configured to perform a hash function on data arriving for storage in the storage memory or data stored in the storage memory, wherein the identifying is based on results of the
Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP] · CPC title
De-duplication techniques · CPC title
in relation to life time, e.g. increasing Mean Time Between Failures [MTBF] · CPC title
Cleaning, compaction, garbage collection, erase control · CPC title
using reference counting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.