Systems and methods for protecting deduplicated data
US-9235588-B1 · Jan 12, 2016 · US
US9928249B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9928249-B2 |
| Application number | US-201715410789-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 20, 2017 |
| Priority date | Apr 5, 2013 |
| Publication date | Mar 27, 2018 |
| Grant date | Mar 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system can maintain multiple queues for deduplication requests of different priorities. The system can also designate priority of storage units. The scheduling priority of a deduplication request is based on the priority of the storage unit indicated in the deduplication request and a trigger for the deduplication request.
Opening claim text (preview).
What is claimed is: 1. A method comprising: determining storage unit priority for each of a plurality of storage units that have at least a threshold data amount, wherein the determination of priority is based, at least in part, on a threshold deduplication savings estimate; maintaining a first queue of deduplication requests having a first scheduling priority; maintaining a second queue of deduplication requests having a second scheduling priority that is below the first scheduling priority, wherein deduplication requests in the second queue were generated for storage units determined to have a first storage unit priority and for which a changelog event was detected; in response to a determination that resources are available, select deduplication requests from the first queue for execution; and in response to a determination that resources are available and that no requests are pending in the first queue, select for execution from the second queue based, at least in part, on potential deduplication savings of indicated storage units and changelogs of indicated storage units. 2. The method of claim 1 further comprising estimating deduplication savings for each of the plurality of storage units in response to a determination that each of the plurality of storage units has at least the threshold data amount. 3. The method of claim 1 , wherein determining storage unit priority comprises determining that a storage unit has a first storage unit priority if the deduplication savings estimate for the storage unit satisfies the threshold deduplication savings estimate and determining that a storage unit will continue to be monitored if the deduplication savings estimate for the storage unit does not satisfy the threshold deduplication savings estimate. 4. The method of claim 3 , further comprising monitoring newly detected storage units and storage units that do not satisfy the threshold deduplication savings estimate, wherein monitoring comprises periodically determining whether the storage units have the threshold data amount. 5. The method of claim 1 , wherein maintaining the first queue of deduplication requests comprises enqueuing a deduplication request into the first queue if generated from a user related trigger. 6. The method of claim 1 , further comprising maintaining a third queue of deduplication requests having a third scheduling priority junior to the second scheduling priority, wherein deduplication requests in the second queue were generated for storage units determined to have a second storage unit priority and for which a changelog event was detected. 7. The method of claim 1 , further comprising: maintaining a third queue for requests to estimate deduplication savings for storage units determined to have at least the threshold data amount, wherein the third queue has a third scheduling priority that is greater than the second scheduling priority but less than the first scheduling priority; and maintaining a fourth queue for requests to determine storage units that have at least the threshold data amount, wherein the fourth queue has a fourth scheduling priority that is greater than the second scheduling priority but less than the third scheduling priority. 8. The method of claim 1 , wherein selecting for execution from the second queue comprises selecting a deduplication request from the second queue with a fullest changelog that is not overflowed and based, at least in part, on a calculated potential deduplication savings. 9. One or more non-transitory machine-readable media comprising program code for priority based deduplication scheduling, the program code to: determine storage unit priority for each of a plurality of storage units that have at least a threshold data amount, wherein the determination of priority is based, at least in part, on a threshold deduplication savings estimate; based on generation of a deduplication request, determine a scheduling priority for the deduplication request based, at least in part, on a trigger for the deduplication request, wherein a deduplication request with a changelog based trigger has a lower scheduling priority than a user related trigger and scheduling priority of a deduplication request with a changelog based trigger is also based, at least in part, on storage unit priority of a storage unit indicated in the deduplication request; and in response to a determination of resource availability, select requests in accordance with scheduling priority. 10. The non-transitory machine-readable media of claim 9 further comprising program code to estimate deduplication savings for each of the plurality of storage units determined to have at least the threshold data amount. 11. The non-transitory machine-readable media of claim 10 , wherein the program code to determine storage unit priority comprises program code to determine that a storage unit has a first storage unit priority if the deduplication savings estimate for the storage unit satisfies the threshold deduplication savings estimate and to determine that a storage unit will continue to be monitored if the deduplication savings estimate for the storage unit does not satisfy the threshold deduplication savings estimate. 12. The non-transitory machine-readable media of claim 11 further comprising program code to monitor newly detected storage units and storage units that do not satisfy the threshold deduplication savings estimate, wherein the program code to monitor comprises program code to periodically determine whether the storage units have the threshold data amount. 13. The non-transitory machine-readable media of claim 9 , wherein the program code to determine the scheduling priority for the deduplication request comprises program code to determine whether the deduplication request was triggered by a changelog event, the deduplication request is a manually triggered deduplication request, or whether the deduplication request was automatically triggered by a user configured schedule or a user configured threshold. 14. The non-transitory machine-readable media of claim 9 , wherein the program code to select requests in accordance with scheduling priority comprises program code to: first select for execution deduplication requests with user related triggers; second select for execution a request to estimate deduplication savings for each of the plurality of storage units determined to have at least the threshold data amount; third select for execution deduplication requests with changelog based triggers and that indicate storage units with a first storage unit priority; fourth select for execution a request to determine which storage units have at least the threshold data amount; and fifth select for execution deduplication requests with changelog based triggers and that indicate storage units with a second storage unit priority. 15. The non-transitory machine-readable media of claim 9 further comprising program code to: based on determination of resource availability and determination that a set of deduplication requests with changelog based triggers can be selected for execution in accordance with scheduling priority, select for execution a deduplication request from the set of deduplication requests that does not have an overflowed changelog, has a most full changelog, and based, at least in part, on a calculated potential deduplication savings. 16. An apparatus comprising: a processor; and a machine-readable medium having instructions stored thereon that are executable by the processor to cause the apparatus to, determine storage unit priority for each of a plurality of
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Physics · mapped topic
Saving storage space on storage systems · CPC title
De-duplication techniques · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.