Synchronizing garbage collection and incoming data traffic

US11513953B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11513953-B2
Application numberUS-202017014027-A
CountryUS
Kind codeB2
Filing dateSep 8, 2020
Priority dateSep 8, 2020
Publication dateNov 29, 2022
Grant dateNov 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology describes performing garbage collection while data writes are occurring, which can lead to a conflict in that a new reference to an otherwise non-referenced candidate object for garbage collection is written after the non-referenced candidate object is detected. In one example implementation, orphaned binary large objects (BLOBs) that are not referenced by a descriptor file and are beyond a certain age are detected and deleted via an object references table traversal as part of garbage collection. Before reclaiming a deleted BLOB's capacity, a background process operates to restore the deleted BLOB if a new descriptor file reference to the BLOB was written during the object references table traversal. Capacity is only reclaimed after the object references table traversal and the background processing completes, for those BLOBs that were deleted and had not been restored.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a processor; and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, the operations comprising: traversing a tree data structure to detect an orphaned content addressable storage object comprising a content addressable storage object not referenced by a descriptor file; determining whether the orphaned content addressable storage object satisfies an age criterion; in response to determining that the detected orphaned content addressable storage object satisfies the age criterion, deleting the content addressable storage object resulting in a deleted content addressable storage object; creating a snapshot of a bucket containing the content addressable storage object; determining whether any new descriptor file written during the traversing has restored the deleted content addressable storage object and deleting the snapshot thereafter; and in response to determining that the deleted content addressable storage object has not been restored during the traversing, reclaiming capacity occupied by the deleted content addressable storage object. 2. The system of claim 1 , wherein the orphaned content addressable storage object is a first orphaned content addressable storage object, and wherein the operations further comprise creating a new descriptor file, determining that a second orphaned content addressable storage object referenced by the new descriptor file does not exist, and restoring the second orphaned content addressable storage object from the snapshot. 3. The system of claim 1 , wherein the age criterion is based on a maximum duration of a descriptor file write transaction. 4. The system of claim 3 , wherein the maximum duration is two weeks. 5. The system of claim 1 , wherein the deleting the content addressable storage object is performed by a first garbage collection process, and wherein the reclaiming the capacity occupied by the deleted content addressable storage object is performed by a second garbage collection engine. 6. The system of claim 1 , wherein the detected content addressable storage object is a binary large object. 7. The system of claim 1 , wherein the descriptor file is a C-Clip descriptor file. 8. A method, comprising: creating, via a processor of a data storage system, a snapshot of a container containing content addressable storage objects; traversing an object references data structure corresponding to the container to determine orphaned content addressable storage objects comprising content addressable storage objects that are not referenced by at least one descriptor file; deleting, from the container, the orphaned content addressable storage objects that are older than a predetermined descriptor file write duration, the deleting resulting in deleted content addressable storage objects; and reclaiming capacity occupied by the deleted content addressable storage objects that have not been restored from the snapshot via any new descriptor file created during the traversing. 9. The method of claim 8 , wherein the traversing the object references data structure comprises running a first garbage collection operation, and wherein the reclaiming the capacity comprises running a second garbage collection operation. 10. The method of claim 8 , further comprising writing a new descriptor file, during the traversing and before the reclaiming the capacity, that references a deleted content addressable storage object, and restoring the content addressable storage object from the snapshot to an undeleted state. 11. The method of claim 10 , wherein the writing the new descriptor file comprises running a foreground process, and wherein the restoring the content addressable storage object from the snapshot to the undeleted state comprises running a background process. 12. The method of claim 11 , wherein the reclaiming the capacity is performed after the background process completes. 13. The method of claim 8 , wherein the traversing the object references data structure comprises traversing a search tree. 14. The method of claim 8 , wherein the creating the snapshot of the container comprises creating a first snapshot of a first container, and further comprising creating respective one or more snapshots of one or more respective containers. 15. The method of claim 8 , wherein the deleting the orphaned content addressable storage objects employs the predetermined descriptor file write duration that is based on a maximum duration of a descriptor file write transaction. 16. A non-transitory machine-readable medium, comprising executable instructions that, when executed by a processor of a data storage system, facilitate performance of operations, the operations comprising: creating a snapshot of a container containing content addressable storage objects; deleting, as part of a first garbage collection process, a content addressable storage object, maintained in the container, that is not referenced by a descriptor file and is older than an age that is based on a defined limit on duration of a descriptor file write transaction; and reclaiming capacity, as part of a second garbage collection process, the content addressable storage object in response to determining that the content addressable storage object was unable to be restored from the snapshot during the first garbage collection process. 17. The non-transitory machine-readable medium of claim 16 , wherein the content addressable storage object is a first content addressable storage object, and wherein the operations further comprise, deleting, as part of the first garbage collection process, a second content addressable storage object, maintained in the container, that is not referenced by any descriptor file and is older than the age that is based on the defined limit on the duration of a descriptor file write transaction, writing a new descriptor file with a reference to the second content addressable storage object, restoring the second content addressable storage object from the snapshot, and avoiding reclamation of capacity of the second content addressable storage object during the second garbage collection process. 18. The non-transitory machine-readable medium of claim 17 , wherein the writing the new descriptor file comprises running a foreground process, wherein the restoring the second content addressable storage object from the snapshot comprises running a background process that starts after the foreground process completes, and wherein the reclaiming the capacity in the second garbage collection process starts after the background process completes. 19. The non-transitory machine-readable medium of claim 16 , wherein the operations comprise traversing an object references data structure as part of the first garbage collection process. 20. The method of claim 15 , wherein the maximum duration is two weeks.

Assignees

Inventors

Classifications

  • Server or database system · CPC title

  • Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs · CPC title

  • Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion (error detection or correction of the data by redundancy in operations or in hardware G06F11/14, G06F11/16) · CPC title

  • Structured object, e.g. database record · CPC title

  • Garbage collection, i.e. reclamation of unreferenced memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11513953B2 cover?
The technology describes performing garbage collection while data writes are occurring, which can lead to a conflict in that a new reference to an otherwise non-referenced candidate object for garbage collection is written after the non-referenced candidate object is detected. In one example implementation, orphaned binary large objects (BLOBs) that are not referenced by a descriptor file and a…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F12/0253. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).