Scheduling deduplication in a storage system

US9928249B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9928249-B2
Application numberUS-201715410789-A
CountryUS
Kind codeB2
Filing dateJan 20, 2017
Priority dateApr 5, 2013
Publication dateMar 27, 2018
Grant dateMar 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system can maintain multiple queues for deduplication requests of different priorities. The system can also designate priority of storage units. The scheduling priority of a deduplication request is based on the priority of the storage unit indicated in the deduplication request and a trigger for the deduplication request.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: determining storage unit priority for each of a plurality of storage units that have at least a threshold data amount, wherein the determination of priority is based, at least in part, on a threshold deduplication savings estimate; maintaining a first queue of deduplication requests having a first scheduling priority; maintaining a second queue of deduplication requests having a second scheduling priority that is below the first scheduling priority, wherein deduplication requests in the second queue were generated for storage units determined to have a first storage unit priority and for which a changelog event was detected; in response to a determination that resources are available, select deduplication requests from the first queue for execution; and in response to a determination that resources are available and that no requests are pending in the first queue, select for execution from the second queue based, at least in part, on potential deduplication savings of indicated storage units and changelogs of indicated storage units. 2. The method of claim 1 further comprising estimating deduplication savings for each of the plurality of storage units in response to a determination that each of the plurality of storage units has at least the threshold data amount. 3. The method of claim 1 , wherein determining storage unit priority comprises determining that a storage unit has a first storage unit priority if the deduplication savings estimate for the storage unit satisfies the threshold deduplication savings estimate and determining that a storage unit will continue to be monitored if the deduplication savings estimate for the storage unit does not satisfy the threshold deduplication savings estimate. 4. The method of claim 3 , further comprising monitoring newly detected storage units and storage units that do not satisfy the threshold deduplication savings estimate, wherein monitoring comprises periodically determining whether the storage units have the threshold data amount. 5. The method of claim 1 , wherein maintaining the first queue of deduplication requests comprises enqueuing a deduplication request into the first queue if generated from a user related trigger. 6. The method of claim 1 , further comprising maintaining a third queue of deduplication requests having a third scheduling priority junior to the second scheduling priority, wherein deduplication requests in the second queue were generated for storage units determined to have a second storage unit priority and for which a changelog event was detected. 7. The method of claim 1 , further comprising: maintaining a third queue for requests to estimate deduplication savings for storage units determined to have at least the threshold data amount, wherein the third queue has a third scheduling priority that is greater than the second scheduling priority but less than the first scheduling priority; and maintaining a fourth queue for requests to determine storage units that have at least the threshold data amount, wherein the fourth queue has a fourth scheduling priority that is greater than the second scheduling priority but less than the third scheduling priority. 8. The method of claim 1 , wherein selecting for execution from the second queue comprises selecting a deduplication request from the second queue with a fullest changelog that is not overflowed and based, at least in part, on a calculated potential deduplication savings. 9. One or more non-transitory machine-readable media comprising program code for priority based deduplication scheduling, the program code to: determine storage unit priority for each of a plurality of storage units that have at least a threshold data amount, wherein the determination of priority is based, at least in part, on a threshold deduplication savings estimate; based on generation of a deduplication request, determine a scheduling priority for the deduplication request based, at least in part, on a trigger for the deduplication request, wherein a deduplication request with a changelog based trigger has a lower scheduling priority than a user related trigger and scheduling priority of a deduplication request with a changelog based trigger is also based, at least in part, on storage unit priority of a storage unit indicated in the deduplication request; and in response to a determination of resource availability, select requests in accordance with scheduling priority. 10. The non-transitory machine-readable media of claim 9 further comprising program code to estimate deduplication savings for each of the plurality of storage units determined to have at least the threshold data amount. 11. The non-transitory machine-readable media of claim 10 , wherein the program code to determine storage unit priority comprises program code to determine that a storage unit has a first storage unit priority if the deduplication savings estimate for the storage unit satisfies the threshold deduplication savings estimate and to determine that a storage unit will continue to be monitored if the deduplication savings estimate for the storage unit does not satisfy the threshold deduplication savings estimate. 12. The non-transitory machine-readable media of claim 11 further comprising program code to monitor newly detected storage units and storage units that do not satisfy the threshold deduplication savings estimate, wherein the program code to monitor comprises program code to periodically determine whether the storage units have the threshold data amount. 13. The non-transitory machine-readable media of claim 9 , wherein the program code to determine the scheduling priority for the deduplication request comprises program code to determine whether the deduplication request was triggered by a changelog event, the deduplication request is a manually triggered deduplication request, or whether the deduplication request was automatically triggered by a user configured schedule or a user configured threshold. 14. The non-transitory machine-readable media of claim 9 , wherein the program code to select requests in accordance with scheduling priority comprises program code to: first select for execution deduplication requests with user related triggers; second select for execution a request to estimate deduplication savings for each of the plurality of storage units determined to have at least the threshold data amount; third select for execution deduplication requests with changelog based triggers and that indicate storage units with a first storage unit priority; fourth select for execution a request to determine which storage units have at least the threshold data amount; and fifth select for execution deduplication requests with changelog based triggers and that indicate storage units with a second storage unit priority. 15. The non-transitory machine-readable media of claim 9 further comprising program code to: based on determination of resource availability and determination that a set of deduplication requests with changelog based triggers can be selected for execution in accordance with scheduling priority, select for execution a deduplication request from the set of deduplication requests that does not have an overflowed changelog, has a most full changelog, and based, at least in part, on a calculated potential deduplication savings. 16. An apparatus comprising: a processor; and a machine-readable medium having instructions stored thereon that are executable by the processor to cause the apparatus to, determine storage unit priority for each of a plurality of

Assignees

Inventors

Classifications

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Physics · mapped topic

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • De-duplication techniques · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9928249B2 cover?
A system can maintain multiple queues for deduplication requests of different priorities. The system can also designate priority of storage units. The scheduling priority of a deduplication request is based on the priority of the storage unit indicated in the deduplication request and a trigger for the deduplication request.
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30156. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).