Push-based piggyback system for source-driven logical replication in a storage environment

US10620852B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10620852-B2
Application numberUS-201715851895-A
CountryUS
Kind codeB2
Filing dateDec 22, 2017
Priority dateDec 14, 2012
Publication dateApr 14, 2020
Grant dateApr 14, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed techniques enable push-based piggybacking of a source-driven logical replication system. Logical replication of a data set (e.g., a snapshot) from a source node to a destination node can be achieved from a source-driven system while preserving the effects of storage efficiency operations (deduplication) applied at the source node. However, if missing data extents are detected at the destination, the destination has an extent pulling problem as the destination may not have knowledge of the physical layout on the source-side and/or mechanisms for requesting extents. The techniques overcome the extent pulling problem in a source-driven replication system by introducing specific protocols for obtaining missing extents within an existing replication environment by piggybacking data pushes from the source.

First claim

Opening claim text (preview).

The invention claimed is: 1. A non-transitory computer readable storage medium having instructions stored thereon, which when executed by one or more processors of a machine, causes the machine to: identify, at a first node, a missing extent associated with a replicated data set, of an original data set at a second node, to reconstructed at the first node; send, from the first node to the second node, a push inquiry response to cause the second node to send data of the missing extent based upon a determination that an extent dictionary, maintained by the first node to track previously requested missing extents such that the first node refrains from re-requesting missing extents tracked within the extent dictionary, lacks an entry for the missing extent; add a new entry into the extent dictionary to indicate that the missing extent has been requested through the push inquiry response; and receive, from the second node at the first node, a data stream comprising data of the missing extent and a piggyback flag indicating that the first node requested the missing extent through the push inquiry response, wherein the new entry marked as requested in the extent dictionary based upon receipt of the piggyback flag. 2. The non-transitory computer readable storage medium of claim 1 wherein the instructions, when executed by the one or more processors, further cause the machine to: reconstruct, at the first node, the replicated data set from a plurality of extents in response to reception of the missing extent. 3. The non-transitory computer readable storage medium of claim 1 , wherein the data stream further includes the piggyback flag indicating that the missing extent was indicated as missing in the push inquiry response. 4. The non-transitory computer readable storage medium of claim 1 wherein the instructions, when executed by the one or more processors, further cause the machine to: determine, at the first node, if the missing extent has already been requested via the push inquiry response by looking up the missing extent in the extent dictionary. 5. The non-transitory computer readable storage medium of claim 4 wherein the instructions, when executed by the one or more processors, further cause the machine to: refrain from subsequently re-requesting the missing extent based upon the extent dictionary comprising the new entry for the missing extent. 6. The non-transitory computer readable storage medium of claim 4 , wherein the first node searches the extent dictionary in O(log N). 7. The non-transitory computer readable storage medium of claim 1 , wherein the replicated data set is deduplicated at the second node, and when the replicated data set is reconstructed at the first node, the replicated data set maintains the deduplication. 8. The non-transitory computer readable storage medium of claim 1 wherein the instructions, when executed by the one or more processors, further cause the machine to: receive, at the first node, a request for the missing extent from a destination module. 9. The non-transitory computer readable storage medium of claim 1 , wherein the data set comprises a point-in-time image. 10. A method comprising: identifying, at a first node, a missing extent associated with a replicated data set, of an original data set at a second node, to reconstruct at the first node; sending, from the first node to the second node, a push inquiry response to cause the second node to send data of the missing extent based upon a determination that an extent dictionary, maintained by the first node to track previously requested missing extents such that the first node refrains from re-requesting missing extents tracked within the extent dictionary, lacks an entry for the missing extent; adding a new entry into the extent dictionary to indicate that the missing extent has been requested through the push inquiry response; and receiving, from the second node at the first node, a data stream comprising data of the missing extent and a piggyback flag indicating that the first node requested the missing extent through the push inquiry response, wherein the new entry marked as requested in from the extent dictionary based upon receipt of the piggyback flag. 11. The method of claim 10 , comprising: reconstructing, at the first node, the replicated data set from a plurality of extents in response to reception of the missing extent. 12. The method of claim 10 , wherein the data stream further includes the piggyback flag indicating that the missing extent was indicated as missing in the push inquiry response. 13. The method of claim 10 , further comprising: determining, at the first node, if the missing extent has already been requested via the push inquiry response by determining whether the extent dictionary comprises an entry for the missing extent. 14. A first node, comprising: a memory having stored thereon instructions for performing a method; and a processor coupled to the memory, the processor configured to execute the instructions to cause the processor to: identify, at the first node, a missing extent associated with a replicated data set, of an original data set at a second node, to reconstruct at the first node; send, from the first node to the second node, a push inquiry response to cause the second node to send data of the missing extent based upon a determination that an extent dictionary, maintained by the first node to track previously requested missing extents such that the first node refrains from re-requesting missing extents tracked within the extent dictionary, lacks an entry for the missing extent; add a new entry into the extent dictionary to indicate that the missing extent has been requested through the push inquiry response; and receive, from the second node at the first node, a data stream comprising data of the missing extent and a piggyback flag indicating that the first node requested the missing extent through the push inquiry response, wherein the new entry marked as requested in from the extent dictionary based upon receipt of the piggyback flag. 15. The first node of claim 14 , wherein the instructions to cause the processor to reconstruct the replicated data set from a plurality of extents on reception of the missing extent. 16. The first node of claim 14 , wherein the instructions to cause the processor to evaluate the extent dictionary to determine whether missing extents have already been requested via the push inquiry responses. 17. The first node of claim 16 , wherein the instructions to cause the processor to sort the extent dictionary based upon virtual volume block numbers. 18. The first node of claim 16 , wherein entries within the extent dictionary for missing extent identifiers are sorted in O(N*log N) in the extent dictionary for searching in O(log N). 19. The first node of claim 16 , wherein the instructions to cause the processor to receive an indication of the missing extent block from a destination customer. 20. The first node of claim 16 , wherein the instructions to cause the processor to determine, at the first node, if the missing extent has already been requested via the push inquiry response by looking up the missing extent in the extent dictionary.

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • implemented as replicated file system · CPC title

  • G06F16/178Primary

    Techniques for file synchronisation in file systems · CPC title

  • G06F3/0619Primary

    in relation to data integrity, e.g. data losses, bit errors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10620852B2 cover?
The disclosed techniques enable push-based piggybacking of a source-driven logical replication system. Logical replication of a data set (e.g., a snapshot) from a source node to a destination node can be achieved from a source-driven system while preserving the effects of storage efficiency operations (deduplication) applied at the source node. However, if missing data extents are detected at t…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/178. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 14 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).