Reference tracking garbage collection for geographically distributed storage system
US-10346299-B1 · Jul 9, 2019 · US
US11513953B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11513953-B2 |
| Application number | US-202017014027-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 8, 2020 |
| Priority date | Sep 8, 2020 |
| Publication date | Nov 29, 2022 |
| Grant date | Nov 29, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The technology describes performing garbage collection while data writes are occurring, which can lead to a conflict in that a new reference to an otherwise non-referenced candidate object for garbage collection is written after the non-referenced candidate object is detected. In one example implementation, orphaned binary large objects (BLOBs) that are not referenced by a descriptor file and are beyond a certain age are detected and deleted via an object references table traversal as part of garbage collection. Before reclaiming a deleted BLOB's capacity, a background process operates to restore the deleted BLOB if a new descriptor file reference to the BLOB was written during the object references table traversal. Capacity is only reclaimed after the object references table traversal and the background processing completes, for those BLOBs that were deleted and had not been restored.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a processor; and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, the operations comprising: traversing a tree data structure to detect an orphaned content addressable storage object comprising a content addressable storage object not referenced by a descriptor file; determining whether the orphaned content addressable storage object satisfies an age criterion; in response to determining that the detected orphaned content addressable storage object satisfies the age criterion, deleting the content addressable storage object resulting in a deleted content addressable storage object; creating a snapshot of a bucket containing the content addressable storage object; determining whether any new descriptor file written during the traversing has restored the deleted content addressable storage object and deleting the snapshot thereafter; and in response to determining that the deleted content addressable storage object has not been restored during the traversing, reclaiming capacity occupied by the deleted content addressable storage object. 2. The system of claim 1 , wherein the orphaned content addressable storage object is a first orphaned content addressable storage object, and wherein the operations further comprise creating a new descriptor file, determining that a second orphaned content addressable storage object referenced by the new descriptor file does not exist, and restoring the second orphaned content addressable storage object from the snapshot. 3. The system of claim 1 , wherein the age criterion is based on a maximum duration of a descriptor file write transaction. 4. The system of claim 3 , wherein the maximum duration is two weeks. 5. The system of claim 1 , wherein the deleting the content addressable storage object is performed by a first garbage collection process, and wherein the reclaiming the capacity occupied by the deleted content addressable storage object is performed by a second garbage collection engine. 6. The system of claim 1 , wherein the detected content addressable storage object is a binary large object. 7. The system of claim 1 , wherein the descriptor file is a C-Clip descriptor file. 8. A method, comprising: creating, via a processor of a data storage system, a snapshot of a container containing content addressable storage objects; traversing an object references data structure corresponding to the container to determine orphaned content addressable storage objects comprising content addressable storage objects that are not referenced by at least one descriptor file; deleting, from the container, the orphaned content addressable storage objects that are older than a predetermined descriptor file write duration, the deleting resulting in deleted content addressable storage objects; and reclaiming capacity occupied by the deleted content addressable storage objects that have not been restored from the snapshot via any new descriptor file created during the traversing. 9. The method of claim 8 , wherein the traversing the object references data structure comprises running a first garbage collection operation, and wherein the reclaiming the capacity comprises running a second garbage collection operation. 10. The method of claim 8 , further comprising writing a new descriptor file, during the traversing and before the reclaiming the capacity, that references a deleted content addressable storage object, and restoring the content addressable storage object from the snapshot to an undeleted state. 11. The method of claim 10 , wherein the writing the new descriptor file comprises running a foreground process, and wherein the restoring the content addressable storage object from the snapshot to the undeleted state comprises running a background process. 12. The method of claim 11 , wherein the reclaiming the capacity is performed after the background process completes. 13. The method of claim 8 , wherein the traversing the object references data structure comprises traversing a search tree. 14. The method of claim 8 , wherein the creating the snapshot of the container comprises creating a first snapshot of a first container, and further comprising creating respective one or more snapshots of one or more respective containers. 15. The method of claim 8 , wherein the deleting the orphaned content addressable storage objects employs the predetermined descriptor file write duration that is based on a maximum duration of a descriptor file write transaction. 16. A non-transitory machine-readable medium, comprising executable instructions that, when executed by a processor of a data storage system, facilitate performance of operations, the operations comprising: creating a snapshot of a container containing content addressable storage objects; deleting, as part of a first garbage collection process, a content addressable storage object, maintained in the container, that is not referenced by a descriptor file and is older than an age that is based on a defined limit on duration of a descriptor file write transaction; and reclaiming capacity, as part of a second garbage collection process, the content addressable storage object in response to determining that the content addressable storage object was unable to be restored from the snapshot during the first garbage collection process. 17. The non-transitory machine-readable medium of claim 16 , wherein the content addressable storage object is a first content addressable storage object, and wherein the operations further comprise, deleting, as part of the first garbage collection process, a second content addressable storage object, maintained in the container, that is not referenced by any descriptor file and is older than the age that is based on the defined limit on the duration of a descriptor file write transaction, writing a new descriptor file with a reference to the second content addressable storage object, restoring the second content addressable storage object from the snapshot, and avoiding reclamation of capacity of the second content addressable storage object during the second garbage collection process. 18. The non-transitory machine-readable medium of claim 17 , wherein the writing the new descriptor file comprises running a foreground process, wherein the restoring the second content addressable storage object from the snapshot comprises running a background process that starts after the foreground process completes, and wherein the reclaiming the capacity in the second garbage collection process starts after the background process completes. 19. The non-transitory machine-readable medium of claim 16 , wherein the operations comprise traversing an object references data structure as part of the first garbage collection process. 20. The method of claim 15 , wherein the maximum duration is two weeks.
Server or database system · CPC title
Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs · CPC title
Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion (error detection or correction of the data by redundancy in operations or in hardware G06F11/14, G06F11/16) · CPC title
Structured object, e.g. database record · CPC title
Garbage collection, i.e. reclamation of unreferenced memory · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.