Systems and methods for protecting deduplicated data
US-9235588-B1 · Jan 12, 2016 · US
US11169967B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11169967-B2 |
| Application number | US-201916716759-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 17, 2019 |
| Priority date | Oct 18, 2012 |
| Publication date | Nov 9, 2021 |
| Grant date | Nov 9, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage system if the probability of deduplication for the data object has a specified relationship to a specified threshold.
Opening claim text (preview).
The invention claimed is: 1. A method, comprising: evaluating characteristics of a data object to determine a deduplication probability of the data object based upon a projected likelihood that deduplication of the data object will reduce storage consumption within a storage device of a data storage system, wherein the characteristics correspond to at least one of a size of the data object, a type of the data object, an owner of the data object, a last modified time of the data object, or an update frequency of the data object; determining an inline deduplication probability threshold and a post-processing deduplication probability threshold based upon performance and resource availability of the data storage system; performing inline deduplication with respect to the data object based upon the deduplication probability exceeding the inline deduplication probability threshold; and performing post-processing deduplication for the data object based upon the deduplication probability exceeding the post-processing deduplication probability threshold. 2. The method of claim 1 , comprising: performing the post-processing deduplication subsequent to the data object being stored within the storage device. 3. The method of claim 1 , comprising: determining the deduplication probability based upon whether aggregated statistical information indicates that deduplication of data objects, having the characteristics of the data object, previously provided deduplication benefits. 4. The method of claim 1 , comprising: defining the inline deduplication probability threshold based upon predetermined performance metrics. 5. The method of claim 1 , comprising: performing the inline deduplication for data objects before the data objects are stored within the storage device based upon the data objects having deduplication probabilities greater than the post-processing deduplication probability threshold. 6. The method of claim 1 , comprising: performing the post-processing deduplication while the data object is stored within the storage device. 7. The method of claim 1 , comprising: performing the post-processing deduplication as a background operation. 8. A computing device comprising: a memory having stored thereon instructions for performing a method; and a processor coupled to the memory, the processor configured to execute the instructions to cause the processor to: evaluate characteristics of a data object to determine a deduplication probability of the data object based upon a projected likelihood that deduplication of the data object will reduce storage consumption within a storage device of a data storage system, wherein the characteristics correspond to at least one of a size of the data object, a type of the data object, an owner of the data object, a last modified time of the data object, or an update frequency of the data object; determine an inline deduplication probability threshold and a post-processing deduplication probability threshold based upon performance and resource availability of the data storage system; perform inline deduplication with respect to the data object based upon the deduplication probability exceeding the inline deduplication probability threshold; and perform post-processing deduplication for the data object based upon the deduplication probability exceeding the post-processing deduplication probability threshold. 9. The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication probability based upon a size characteristic of the data object. 10. The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication probability based upon a data object type of the data object. 11. The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication probability based upon a last modified timestamp of the data object. 12. The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication probability based upon an update frequency of the data object. 13. The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication probability based upon statistic information derived from a previous deduplication operation. 14. The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication probability based upon at least one a read throughput, a write throughput, a read response time, a write response time, network utilization, and a performance characteristic. 15. A non-transitory machine-readable storage media having stored thereon instructions, for performing a method, which causes a processor to: evaluate characteristics of a data object to determine a deduplication probability of the data object based upon a projected likelihood that deduplication of the data object will reduce storage consumption within a storage device of a data storage system, wherein the characteristics correspond to at least one of a size of the data object, a type of the data object, an owner of the data object, a last modified time of the data object, or an update frequency of the data object; determine an inline deduplication probability threshold and a post-processing deduplication probability threshold based upon performance and resource availability of the data storage system; perform inline deduplication with respect to the data object based upon the deduplication probability exceeding the inline deduplication probability threshold; and perform post-processing deduplication for the data object based upon the deduplication probability exceeding the post-processing deduplication probability threshold. 16. The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication probability based upon a data object type of the data object. 17. The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication probability based upon a last modified timestamp of the data object. 18. The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication probability based upon an update frequency of the data object. 19. The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication probability based upon statistic information derived from a previous deduplication operation. 20. The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication probability based upon a size characteristic of the data object.
Saving storage space on storage systems · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
De-duplication techniques · CPC title
Change logging, detection, and notification (replication G06F16/27) · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.