File immutability using a deduplication file system in a public cloud using new filesystem redirection
US-2024103978-A1 · Mar 28, 2024 · US
US9348538B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9348538-B2 |
| Application number | US-201213655263-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 18, 2012 |
| Priority date | Oct 18, 2012 |
| Publication date | May 24, 2016 |
| Grant date | May 24, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage system if the probability of deduplication for the data object has a specified relationship to a specified threshold.
Opening claim text (preview).
What is claimed is: 1. A method comprising: determining in a storage system a probability of deduplication for a data object, the probability of deduplication for the data object determined based on a characteristic of the data object, wherein the probability of deduplication for the data object is a statistical projection indicating a likelihood that the data object will provide a storage space benefit as a result of deduplication; determining in the storage system a deduplication probability threshold, the deduplication probability threshold determined based on a performance metric of the storage system and adjusted based on availability of resources of the storage system and recent performance of the storage system relative to the performance metric; determining in the storage system whether the probability of deduplication for the data object satisfies the deduplication probability threshold; and performing a deduplication operation on the data object in the storage system prior to the data object being stored in a persistent storage of the storage system in an event it is determined that the probability of deduplication for the data object satisfies the deduplication probability threshold. 2. The method of claim 1 wherein adjusting the deduplication probability threshold includes reducing the deduplication probability threshold in an event a performance projection for the storage system indicates that the storage system will continue to satisfy the performance metric using the reduced deduplication probability threshold. 3. The method of claim 2 wherein the performance projection is based on the recent performance of the storage system. 4. The method of claim 1 wherein determining the probability of deduplication for the data object based on the characteristic includes determining the probability of deduplication for the data object based on statistical information from prior deduplication operations on other data objects having the characteristic. 5. The method of claim 4 wherein the statistical information indicates a percentage of the other data objects having the characteristic that were deduplication candidates in the prior deduplication operations, wherein a deduplication candidate includes data blocks that are duplicates of other data blocks in the storage system. 6. The method of claim 5 wherein the prior deduplication operations include a plurality of post-processing deduplication operations performed on data objects already stored in the persistent storage of the storage system. 7. The method of claim 4 wherein determining the probability of deduplication for the data object further includes determining a second characteristic of the data object, and determining the probability of deduplication for the data object is further based on statistical information associated with the second characteristic. 8. The method of claim 1 wherein the performance metric is a service level objective (SLO). 9. The method of claim 1 wherein the performance metric includes a response time of the storage system, a throughput of the storage system, a utilization of one or more networks associated with the storage system, or a combination thereof. 10. The method of claim 1 wherein the characteristic is a size of the data object, a type of the data object, an owner of the data object, a last modified date of the data object, an update frequency of the data object, or a combination thereof. 11. The method of claim 1 further comprising performing a post-processing deduplication operation on the data object after the data object is stored in the persistent storage of the storage system in an event it is determined that the probability of deduplication for the data object does not satisfy the deduplication probability threshold. 12. The method of claim 1 wherein the storage system is operated in a Network Attached Storage (NAS) environment or in a Storage Area Network (SAN). 13. A data storage system comprising: a processor; and a memory coupled with the processor and including a storage manager that directs the processor to: determine a deduplication probability threshold, the deduplication probability threshold determined based on a performance metric of the data storage system and adjusted based on availability of resources of the data storage system and recent performance of the data storage system relative to the performance metric; determine, prior to a data object being stored in a persistent storage, a probability of deduplication for the data object, the probability of deduplication for the data object determined based on a characteristic of the data object, wherein the probability of deduplication for the data object is a statistical projection indicating a likelihood that the data object will provide a storage space benefit as a result of deduplication; determine whether the probability of deduplication for the data object satisfies the deduplication probability threshold; and perform a deduplication operation on the data object prior to the data object being stored in the persistent storage in an event it is determined that the probability of deduplication for the data object satisfies the deduplication probability threshold. 14. The data storage system of claim 13 wherein to adjust the deduplication probability threshold includes to reduce the deduplication probability threshold in an event a performance projection for the data storage system based on historical performance information indicates that the data storage system will continue to meet the performance metric using the reduced deduplication probability threshold. 15. The data storage system of claim 13 wherein to determine the probability of deduplication for the data object based on the characteristic includes to: determine the probability of deduplication for the data object based on statistical information associated with the characteristic, the statistical information aggregated from prior deduplication operations. 16. The data storage system of claim 15 wherein the statistical information associated with the characteristic indicates a percentage of other data objects having the characteristic which were found to include data blocks that were duplicates of other data blocks during the prior deduplication operations. 17. The data storage system of claim 15 wherein: to determine the probability of deduplication for the data object based on the characteristic further includes to determine a second characteristic of the data object; and to determine the probability of deduplication for the data object is further based on statistical information associated with the second characteristic. 18. The data storage system of claim 13 wherein the performance metric is a service level objective (SLO). 19. The data storage system of claim 13 wherein the performance metric includes a response time of the data storage system, a throughput of the data storage system, a utilization of a network associated with the data storage system, or a combination thereof. 20. The data storage system of claim 13 wherein the characteristic is a size of the data object, a type of the data object, an owner of the data object, a last modified date of the data object, an update frequency of the data object, or a combination thereof. 21. The data storage system of claim 13 wherein the storage manager further directs the processor to perform a post-processing deduplication operation on the data object after it is stored in the persistent storage in a
Change logging, detection, and notification (replication G06F16/27) · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Saving storage space on storage systems · CPC title
De-duplication techniques · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.