Light-weight index deduplication and hierarchical snapshot replication
US-2021342297-A1 · Nov 4, 2021 · US
US11436102B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11436102-B2 |
| Application number | US-202016998060-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 20, 2020 |
| Priority date | Aug 20, 2020 |
| Publication date | Sep 6, 2022 |
| Grant date | Sep 6, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Solutions for managing archived storage include receiving, at a first node, a snapshot comprising object data (e.g., a virtual machine disk snapshot) from a second node (e.g., a software defined data center), and storing the snapshot in a tiered structure that includes a data tier and a metadata tier. Snapshots may be used for fail-over operations and/or backups, to support disaster recovery. The data tier comprises a log-structured file system (LFS), and the metadata tier comprises a content addressable storage (CAS) identifying addresses within the LFS. The metadata tier also comprises a logical layer indicating content in the CAS. Segment cleaning of the data tier is performed using a segment usage table (SUT). Some examples include performing a fail-over operation from the second node to a third node using at least the stored snapshot for workload recovery. In some examples, the CAS comprises a log-structured merge-tree (LSM-tree).
Opening claim text (preview).
What is claimed is: 1. A method of managing archived storage, the method comprising: receiving, at a first node, from an upload agent at a second node, a snapshot comprising object data; storing the snapshot in a primary storage in a tiered structure, wherein the tiered structure comprises a data tier and a metadata tier, wherein the data tier comprises a log-structured file system (LFS) for storing the snapshot, wherein the metadata tier comprises a content addressable storage (CAS) identifying addresses within the LFS, and wherein the metadata tier further comprises a logical layer indicating content in the CAS; and performing segment cleaning of the data tier using a segment usage table (SUT). 2. The method of claim 1 , further comprising: performing a backup using at least the stored snapshot; or performing a fail-over operation from the second node to a third node using at least the stored snapshot for workload recovery. 3. The method of claim 1 , wherein the segment cleaning comprises: determining, based at least on numbers of live blocks indicated in the SUT, a plurality of segment cleaning candidates; and for each segment cleaning candidate of the plurality of segment cleaning candidates: determining whether a block in the segment cleaning candidate is live; based at least on the block not being live, marking the block as free; and based at least on the block being live, including the block in a coalescing operation. 4. The method of claim 1 , further comprising: based at least on access costs, calculating an expected cost of a segment cleaning operation; based at least on storage costs, calculating an expected cost savings from the segment cleaning; based at least on the expected cost of the segment cleaning and the expected cost savings from the segment cleaning, determining whether to perform the segment cleaning; and based at least on making a determination to perform the segment cleaning, performing the segment cleaning. 5. The method of claim 1 , further comprising: performing deduplication of the snapshot using at least the CAS; and based at least on a retention schedule, deleting at least a portion of the snapshot or moving at least a portion of the snapshot from the primary storage to a long-term storage. 6. The method of claim 1 , wherein the CAS comprises a log-structured merge-tree (LSM-tree). 7. The method of claim 1 , wherein the second node comprises a software defined data center (SDDC), and wherein the snapshot comprises a versioned object difference. 8. A computer system for managing archived storage, the computer system comprising: a processor; and a non-transitory computer readable medium having stored thereon program code for transferring data to another computer system, the program code causing the processor to: receive, at a first node, from an upload agent at a second node, a snapshot comprising object data; store the snapshot in a primary storage in a tiered structure, wherein the tiered structure comprises a data tier and a metadata tier, wherein the data tier comprises a log-structured file system (LFS) for storing the snapshot, wherein the metadata tier comprises a content addressable storage (CAS) identifying addresses within the LFS, and wherein the metadata tier further comprises a logical layer indicating content in the CAS; and perform segment cleaning of the data tier using a segment usage table (SUT). 9. The computer system of claim 8 , wherein the program code is further operative to: perform a backup using at least the stored snapshot; or perform a fail-over operation from the second node to a third node using at least the stored snapshot for workload recovery. 10. The computer system of claim 8 , wherein the program code is further operative to: determine, based at least on numbers of live blocks indicated in the SUT, a plurality of segment cleaning candidates; and for each segment cleaning candidate of the plurality of segment cleaning candidates: determine whether a block in the segment cleaning candidate is live; based at least on the block not being live, mark the block as free; and based at least on the block being live, include the block in a coalescing operation. 11. The computer system of claim 8 , wherein the program code is further operative to: based at least on access costs, calculate an expected cost of a segment cleaning operation; based at least on storage costs, calculate an expected cost savings from the segment cleaning; based at least on the expected cost of the segment cleaning and the expected cost savings from the segment cleaning, determine whether to perform the segment cleaning; and based at least on making a determination to perform the segment cleaning, perform the segment cleaning. 12. The computer system of claim 8 , wherein the program code is further operative to: perform deduplication of the snapshot using at least the CAS; and based at least on a retention schedule, delete at least a portion of the snapshot or move at least a portion of the snapshot from the primary storage to a long-term storage. 13. The computer system of claim 8 , wherein the CAS comprises a log-structured merge-tree (LSM-tree). 14. The computer system of claim 8 , wherein the second node comprises a software defined data center (SDDC), and wherein the snapshot comprises a versioned object difference. 15. A non-transitory computer readable storage medium having stored thereon program code executable by a first computer system at a first site, the program code embodying a method comprising: receiving, at a first node, from an upload agent at a second node, a snapshot comprising object data; storing the snapshot in a primary storage in a tiered structure, wherein the tiered structure comprises a data tier and a metadata tier, wherein the data tier comprises a log-structured file system (LFS) for storing the snapshot, wherein the metadata tier comprises a content addressable storage (CAS) identifying addresses within the LFS, and wherein the metadata tier further comprises a logical layer indicating content in the CAS; and performing segment cleaning of the data tier using a segment usage table (SUT). 16. The non-transitory computer readable storage medium of claim 15 , wherein the program code further comprises: performing a backup using at least the stored snapshot; or performing a fail-over operation from the second node to a third node using at least the stored snapshot for workload recovery. 17. The non-transitory computer readable storage medium of claim 15 , wherein the program code further comprises: determining, based at least on numbers of live blocks indicated in the SUT, a plurality of segment cleaning candidates; and for each segment cleaning candidate of the plurality of segment cleaning candidates: determining whether a block in the segment cleaning candidate is live; based at least on the block not being live, marking the block as free; and based at least on the block being live, including the block in a coalescing operation. 18. The non-transitory computer readable storage medium of claim 15 , wherein the program code further comprises: based at least on access costs, calculating an expected cost of a segment cleaning operation; based at least on storage costs, calculating an expected cost savings from the segment cleaning; based at least on the expected cost of the segment cleaning and the expected cost savings from the segment cleaning, determining whether to perform the segment cleaning;
involving logging of persistent data for recovery · CPC title
Backup scheduling policy · CPC title
Trees, e.g. B+trees · CPC title
for networked environments · CPC title
Saving storage space on storage systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.