Garbage collection method for data storage device
US-10657048-B2 · May 19, 2020 · US
US11249901B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11249901-B2 |
| Application number | US-201816213175-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 7, 2018 |
| Priority date | Dec 7, 2018 |
| Publication date | Feb 15, 2022 |
| Grant date | Feb 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The described technology is generally directed towards data storage using a node cluster, and garbage collecting unused chunks (data storage units) in the cluster based on which node owns the particular unused chunks. A node determines which chunks are in use, and exchanges datasets identifying those chunks with other nodes such that the other nodes know which of the chunks that they own are in use. When a node obtains the dataset identifying the chunks in use, the node determines the chunks not in use by a difference of those owned and those in use. This difference dataset is used to garbage collect owned, unused chunks. Garbage collection via this technology is able to be performed in a single cycle.
Opening claim text (preview).
What is claimed is: 1. A method comprising, traversing data structures to obtain metadata from a chunk manager, by an owning node of a node cluster and comprising a processor, to determine a first dataset representing owned chunks corresponding to data structures owned by the owning node of the node cluster; traversing data structures to obtain data from storage devices, by the owning node of the node cluster, to determine a second dataset representing used owned chunks of the owned chunks that are in use in the node cluster; based on a difference between the first dataset and the second dataset, determining, by the owning node of the node cluster, a third dataset representing unused owned chunks of the owned chunks that are not in use in the node cluster; based on the third dataset, deleting, by the owning node of the node cluster, unused chunks from storage devices, wherein the unused owned chunks are not in use in the node cluster by the owning node of the node cluster, wherein obtaining the data to determine the second dataset representing the owned chunks that are in use in the node cluster comprises determining a first group of identifiers corresponding to the used owned chunks that are in use by the owning node, obtaining a second group of identifiers corresponding to other used owned chunks that are in use by one or more non-owning nodes of the node cluster, and combining the first group and the second group into the second dataset; and performing, by the owning node of the node cluster, reference counting in the node cluster to determine whether a sufficient number of chunks are potentially reclaimable via deletion of chunks from storage devices according to a defined sufficiency criterion, and, in response to the sufficient number of chunks being determined to be potentially reclaimable and based on the third dataset, scheduling, by the owning node of the node cluster, a chunk deletion operation that comprises the deleting of the unused owned chunks that are not in use in the node cluster. 2. The method of claim 1 , wherein the data structures comprise trees, and wherein the determining the first group of identifiers corresponding to the used owned chunks that are in use by the owning node comprises traversing the trees owned by the owning node of the node cluster in a depth first traversal to locate chunk identifiers corresponding to the used owned chunks in use. 3. The method of claim 2 , further comprising, maintaining, by the owning node of the node cluster, a cache of chunk identifiers representing recently added chunks of the used owned chunks in use, wherein the determining the first group of identifiers corresponding to the used owned chunks that are in use by the owning node comprises accessing the cache to eliminate duplicated chunks in use located during the traversing of the trees. 4. The method of claim 3 , wherein the trees comprise B+ trees. 5. The method of claim 1 , wherein the combining the first group and the second group into the second dataset comprises removing duplicated identifiers. 6. The method of claim 1 , wherein the obtaining the second group of identifiers corresponding to the other used owned chunks that are in use by the one or more non-owning nodes of the node cluster comprises receiving, from each of the one or more non-owning nodes of the node cluster, a respective data structure containing the identifiers corresponding to the other used owned chunks that are in use by the one or more non-owning nodes. 7. The method of claim 6 , further comprising, persisting, by the owning node of the node cluster, the respective data structure from each of the one or more non-owning nodes in the node cluster. 8. The method of claim 1 , further comprising, determining, by the owning node of the node cluster, a fourth dataset representing non-owned chunks that are in use in by the owning node and are owned by other nodes of the node cluster other than the owning node. 9. The method of claim 8 , further comprising, partitioning, by the owning node of the node cluster, the fourth dataset into respective lists of chunk identifiers, each respective list of the respective lists corresponding to a respective other node in the node cluster that owns respective chunks identified in the respective list, and, for each respective list, providing, by the owning node of the node cluster, the respective list to the respective other node in the node cluster. 10. The method of claim 1 , wherein the data structures are first data structures, the method further comprising: providing second data structures to other nodes, other than the owning node, the second data structures containing second identifiers of chunks that are in use by the owning node and owned by the other nodes. 11. A system, comprising: a garbage collector, comprising a processor and associated with an owning node, in a node cluster, that owns at least one data structure corresponding to owned chunks, the garbage collector configured to: obtain first identifiers of the owned chunks; traverse data structures to obtain data from storage services to determine second identifiers corresponding to used owned chunks that are in use by the owning node; obtain third identifiers from storage corresponding to other used owned chunks that are in use by other nodes of the node cluster other than the owning node; determine, based on the first identifiers, the second identifiers and the third identifiers, fourth identifiers corresponding to unused owned chunks that are not in use by the owning node and are not in use by the other nodes; delete from the storage the unused owned chunks identified by the fourth identifiers; combine the second identifiers and the third identifiers, wherein the combining results in a combined dataset; and perform, by the owning node of the node cluster, reference counting in the node cluster to determine whether a sufficient number of chunks are potentially reclaimable via deletion of chunks from storage devices according to a defined sufficiency criterion, and, in response to the sufficient number of chunks being determined to be potentially reclaimable and based on the third dataset, schedule, by the owning node of the node cluster, a chunk deletion operation that comprises the deleting of the unused owned chunks that are not in use in the node cluster. 12. The system of claim 11 , wherein the garbage collector obtains the third identifiers from the storage as data structures received from the other nodes of the node cluster. 13. The system of claim 11 , wherein the garbage collector determines the fourth identifiers based on a difference of the combined dataset from the first identifiers. 14. The system of claim 11 , wherein the data structure comprises at least one tree, and wherein the garbage collector traverses data structures to obtain data from storage devices to determine the second identifiers corresponding to the used owned chunks that are in use by the owning node by a traversal of the at least one tree owned by the owning node. 15. The system of claim 14 , wherein the garbage collector eliminates duplicates of the second identifiers obtained during the traversal of the at least one tree owned by the owning node. 16. The system of claim 11 , wherein the garbage collector is further configured to: reclaim storage space occupied by the unused owned chunks identified by the fourth identifiers. 17. A non-transitory machine-readable medium, comprising executable instructions that, when executed by a processor of an owning node of a node cluster, facilitate performance of op
Garbage collection, i.e. reclamation of unreferenced memory · CPC title
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Protocols · CPC title
Trees, e.g. B+trees · CPC title
Space efficiency improvement · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.