System and method for data deduplication of backup images
US-9098432-B1 · Aug 4, 2015 · US
US9646067B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9646067-B2 |
| Application number | US-201414120368-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 14, 2014 |
| Priority date | May 14, 2013 |
| Publication date | May 9, 2017 |
| Grant date | May 9, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein are systems and methods for garbage collection prediction. A temporal graph is received, the temporal graph including nodes, the nodes including hash references to objects. An accumulated difference count is updated when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph. A divested difference count is updated when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not by either a parent node of the removed node or any child nodes of the removed node. The outcome of the garbage collection is predicted based on at least one of the accumulated difference count and the divested difference count.
Opening claim text (preview).
What is claimed is: 1. A computerized method of maintaining running information of ingestion and deletion of file system data for a deduplicated data store in which to determine an expected outcome of a garbage collection operation on the deduplicated data store without performing the garbage collection operation, the method comprising: maintaining, by a computer device, a temporal graph, the temporal graph including nodes, the nodes including hash references to objects; updating, by the computer device, an accumulated difference count when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph; updating, by the computer device, a divested difference count when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not referenced by either a parent node of the removed node or any child nodes of the removed node; and determining, by the computer device, an expected amount of data being reclaimable during a garbage collection operation based on both of the accumulated difference count and the divested difference count, thereby estimating an expected outcome of the garbage collection operation without performing the garbage collection operation, wherein determining the expected amount of data being reclaimable during the garbage collection includes determining an amount of data storage used by the deduplicated data store but not referenced by an object. 2. The method of claim 1 , wherein determining an amount of data storage used by the deduplicated data store but not referenced by an object includes determining a ratio of the accumulated difference count to the divested difference count. 3. The method of claim 1 , further comprising performing the garbage collection, by the computer device, when the amount of data storage used by the deduplicated data store but not referenced by an object exceeds a threshold value. 4. A computerized system for maintaining running information of ingestion and deletion of file system data for a deduplicated data store in which to determine an expected outcome of a garbage collection operation on the deduplicated data store without performing the garbage collection operation, comprising a processor configured to run a module stored in memory that is configured to cause the processor to: maintain a temporal graph, the temporal graph including nodes, the nodes including hash references to objects; update an accumulated difference count when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph; update a divested difference count when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not by either a parent node of the removed node or any child nodes of the removed node; and determine an expected amount of data being reclaimable during a garbage collection operation based on both of the accumulated difference count and the divested difference count, thereby estimating an expected outcome of the garbage collection operation without performing the garbage collection operation, wherein determining the expected amount of data being reclaimable during the garbage collection includes determining an amount of data storage used by the deduplicated data store but not referenced by an object. 5. The computerized system of claim 4 , wherein determining an amount of data storage used by the deduplicated data store but not referenced by an object includes determining a ratio of the accumulated difference count to the divested difference count. 6. The method of claim 4 , wherein the module stored in memory is further configured to cause the processor to perform the garbage collection when the amount of data storage used by the deduplicated data store but not referenced by an object exceeds a threshold value. 7. A non-transitory computer readable medium having executable instructions operable to cause an apparatus to: maintain a temporal graph, the temporal graph including nodes, the nodes including hash references to objects; update an accumulated difference count when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph; update a divested difference count when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not by either a parent node of the removed node or any child nodes of the removed node; and determine an expected amount of data being reclaimable during a garbage collection operation based on both of the accumulated difference count and the divested difference count, thereby estimating an expected outcome of the garbage collection operation without performing the garbage collection operation, wherein determining the expected amount of data being reclaimable during the garbage collection includes determining an amount of data storage used by the deduplicated data store but not referenced by an object. 8. The non-transitory computer readable medium of claim 7 , wherein determining an amount of data storage used by the deduplicated data store but not referenced by an object includes determining a ratio of the accumulated difference count to the divested difference count. 9. The non-transitory computer readable medium of claim 7 , wherein the executable instructions are further operable to cause an apparatus to perform the garbage collection when the amount of data storage used by the deduplicated data store but not referenced by an object exceeds a threshold value.
Management of the data involved in backup or backup restore · CPC title
Details of free space management performed by the file system (saving storage space on storage systems G06F3/0608; management of blocks in storage devices G06F3/064) · CPC title
Garbage collection, i.e. reclamation of unreferenced memory · CPC title
based on file chunks · CPC title
Hash-based (content-based indexing of textual data G06F16/31) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.