Garbage collection predictions

US9646067B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9646067-B2
Application numberUS-201414120368-A
CountryUS
Kind codeB2
Filing dateMay 14, 2014
Priority dateMay 14, 2013
Publication dateMay 9, 2017
Grant dateMay 9, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are systems and methods for garbage collection prediction. A temporal graph is received, the temporal graph including nodes, the nodes including hash references to objects. An accumulated difference count is updated when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph. A divested difference count is updated when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not by either a parent node of the removed node or any child nodes of the removed node. The outcome of the garbage collection is predicted based on at least one of the accumulated difference count and the divested difference count.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized method of maintaining running information of ingestion and deletion of file system data for a deduplicated data store in which to determine an expected outcome of a garbage collection operation on the deduplicated data store without performing the garbage collection operation, the method comprising: maintaining, by a computer device, a temporal graph, the temporal graph including nodes, the nodes including hash references to objects; updating, by the computer device, an accumulated difference count when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph; updating, by the computer device, a divested difference count when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not referenced by either a parent node of the removed node or any child nodes of the removed node; and determining, by the computer device, an expected amount of data being reclaimable during a garbage collection operation based on both of the accumulated difference count and the divested difference count, thereby estimating an expected outcome of the garbage collection operation without performing the garbage collection operation, wherein determining the expected amount of data being reclaimable during the garbage collection includes determining an amount of data storage used by the deduplicated data store but not referenced by an object. 2. The method of claim 1 , wherein determining an amount of data storage used by the deduplicated data store but not referenced by an object includes determining a ratio of the accumulated difference count to the divested difference count. 3. The method of claim 1 , further comprising performing the garbage collection, by the computer device, when the amount of data storage used by the deduplicated data store but not referenced by an object exceeds a threshold value. 4. A computerized system for maintaining running information of ingestion and deletion of file system data for a deduplicated data store in which to determine an expected outcome of a garbage collection operation on the deduplicated data store without performing the garbage collection operation, comprising a processor configured to run a module stored in memory that is configured to cause the processor to: maintain a temporal graph, the temporal graph including nodes, the nodes including hash references to objects; update an accumulated difference count when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph; update a divested difference count when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not by either a parent node of the removed node or any child nodes of the removed node; and determine an expected amount of data being reclaimable during a garbage collection operation based on both of the accumulated difference count and the divested difference count, thereby estimating an expected outcome of the garbage collection operation without performing the garbage collection operation, wherein determining the expected amount of data being reclaimable during the garbage collection includes determining an amount of data storage used by the deduplicated data store but not referenced by an object. 5. The computerized system of claim 4 , wherein determining an amount of data storage used by the deduplicated data store but not referenced by an object includes determining a ratio of the accumulated difference count to the divested difference count. 6. The method of claim 4 , wherein the module stored in memory is further configured to cause the processor to perform the garbage collection when the amount of data storage used by the deduplicated data store but not referenced by an object exceeds a threshold value. 7. A non-transitory computer readable medium having executable instructions operable to cause an apparatus to: maintain a temporal graph, the temporal graph including nodes, the nodes including hash references to objects; update an accumulated difference count when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph; update a divested difference count when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not by either a parent node of the removed node or any child nodes of the removed node; and determine an expected amount of data being reclaimable during a garbage collection operation based on both of the accumulated difference count and the divested difference count, thereby estimating an expected outcome of the garbage collection operation without performing the garbage collection operation, wherein determining the expected amount of data being reclaimable during the garbage collection includes determining an amount of data storage used by the deduplicated data store but not referenced by an object. 8. The non-transitory computer readable medium of claim 7 , wherein determining an amount of data storage used by the deduplicated data store but not referenced by an object includes determining a ratio of the accumulated difference count to the divested difference count. 9. The non-transitory computer readable medium of claim 7 , wherein the executable instructions are further operable to cause an apparatus to perform the garbage collection when the amount of data storage used by the deduplicated data store but not referenced by an object exceeds a threshold value.

Assignees

Inventors

Classifications

  • Management of the data involved in backup or backup restore · CPC title

  • Details of free space management performed by the file system (saving storage space on storage systems G06F3/0608; management of blocks in storage devices G06F3/064) · CPC title

  • Garbage collection, i.e. reclamation of unreferenced memory · CPC title

  • based on file chunks · CPC title

  • Hash-based (content-based indexing of textual data G06F16/31) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9646067B2 cover?
Described herein are systems and methods for garbage collection prediction. A temporal graph is received, the temporal graph including nodes, the nodes including hash references to objects. An accumulated difference count is updated when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in …
Who is the assignee on this patent?
Actifio Inc
What technology area does this patent fall under?
Primary CPC classification G06F12/0253. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 09 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).