Interconnect path failover
US-2015309892-A1 · Oct 29, 2015 · US
US11941278B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11941278-B2 |
| Application number | US-202117520537-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 5, 2021 |
| Priority date | Jun 28, 2019 |
| Publication date | Mar 26, 2024 |
| Grant date | Mar 26, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data storage system includes multiple head nodes and data storage sleds. Volume data is replicated between a primary and one or more secondary head nodes for a volume partition and is further flushed to a set of mass storage devices of the data storage sleds. Volume metadata is maintained in a primary and one or more secondary head nodes for a volume partition and is updated in response to volume data being flushed to the data storage sleds. Also, the primary and secondary head nodes store check-points of volume metadata to the data storage sleds, wherein in response to a failure of a primary or secondary head node for a volume partition, a replacement secondary head node for the volume partition recreates a secondary replica for the volume partition based, at least in part, on a stored volume metadata checkpoint.
Opening claim text (preview).
What is claimed is: 1. A data storage system, comprising: a plurality of head nodes; a plurality of mass storage devices, wherein for a volume partition stored in the data storage system, a first and second head node of the plurality of head nodes are configured to: store data for a replica of the volume partition in a log-structured storage of the respective first or second head node, wherein the log-structured storage comprises a volume data portion and a metadata portion; and wherein the first head node is configured to store, to one or more of the plurality of mass storage devices, a copy of the metadata portion of the volume partition; a failure detection agent configured to: detect a failed one of the plurality of head nodes based on a failure of the failed head node to respond to a ping from the failure detection agent; and indicate to a plurality of remaining ones of the plurality of head nodes that the failed head node has failed, wherein the plurality of remaining ones of the plurality of head nodes are each configured to: identify volume partitions for which replicas are stored on the failed head node; and initiate, for the identified volume partitions, the designation of a replacement replica for the identified volume partitions on respective ones of the remaining head nodes. 2. The data storage system of claim 1 , wherein the plurality of remaining ones of the plurality of head nodes are further configured to: generate a log-structured storage for the replacement replica based on one or more copies of the metadata portions stored on the one or more mass storage devices. 3. The data storage system of claim 1 , wherein the first head node is configured to perform a metadata checkpoint operation, wherein the storing of the copy of the metadata portion of the log-structured storage to the one or more mass storage devices is part of the metadata checkpoint operation performed by the first head node, and wherein the first head node is configured to independently perform the metadata checkpoint operation, independent from performing a flush operation. 4. The data storage system of claim 1 , wherein the ping comprises: a verification that an active network connection exists to a respective head node being pinged. 5. The data storage system of claim 1 , wherein the ping comprises: a query to an operating system of a respective head node being pinged. 6. The data storage system of claim 1 , wherein the ping comprises: a set of queries directed to individual replicas stored on a respective head node being pinged. 7. The data storage system of claim 1 , wherein the ping comprises: a request for performance information directed to a respective head node being pinged, wherein a failure to provide the requested performance information is interpreted as an indication of a failure at the respective head node being pinged. 8. The data storage system of claim 1 , wherein the first head node is configured to: perform said store a copy of the metadata portion of the log-structured storage for the primary replica based on an amount of metadata stored in the first head node, but not yet copied to the mass storage devices, exceeding a threshold amount of stored but not yet copied metadata, and perform a flush operation based on an amount of volume data stored in the log-structured storage for the primary replica exceeding a threshold amount of stored volume data. 9. The data storage system of claim 8 , wherein the first head node is further configured to perform a flush operation, wherein to perform the flush operation, the first head node is configured to: read data stored for the volume partition from the volume data portion of the log-structured storage of the first head node; cause the data read from the volume data portion of the log-structured storage of the first head node to be written to a set of the mass storage devices; and update the metadata portion of the log-structured storage of the first head node to indicate one or more locations at which the data read from the volume data portion is stored on the set of mass storage devices. 10. The data storage system of claim 9 , wherein the first head node is configured to perform said storing the copy of the metadata portion, for the replica of the volume partition, to the mass storage devices independently of performing the flush operation for the replica of the volume partition. 11. A method, comprising: storing data for respective replicas of respective volume partitions in log-structured storages of respective head nodes of a data storage system, wherein the log-structured storages of the head nodes comprise a volume data portion and a metadata portion; storing, to one or more mass storage devices of the data storage system, respective copies of the metadata portions of the volume partitions; detecting a failed one of the plurality of head nodes based on a failure of the failed head node to respond to a ping; and initiating, for one or more identified volume partitions having a replica stored on the failed head node, one or more replacement replicas for the one or more identified volume partitions, wherein the one or more replacement replicas are implemented on one or more respective remaining head nodes of the data storage system that were not detected to be failed, and wherein volume metadata for the one or more replacement replicas is re-mirrored from one or more of the respective copies of the metadata portions stored on the one or more mass storage devices of the data storage system. 12. The method of claim 11 , further comprising: generating a log-structured storage for the one or more replacement replicas based on the one or more copies of the metadata portions stored on the one or more mass storage devices of the data storage system. 13. The method of claim 11 , wherein the ping comprises: a verification that an active network connection exists to a respective head node being pinged. 14. The method of claim 11 , wherein the ping comprises: a query to an operating system of a respective head node being pinged. 15. The method of claim 11 , wherein the ping comprises: a set of queries directed to individual replicas stored on a respective head node being pinged. 16. The method of claim 11 , wherein the ping comprises: a request for performance information directed to a respective head node being pinged, wherein a failure to provide the requested performance information is interpreted as an indication of a failure at the respective head node being pinged. 17. A non-transitory, computer-readable medium storing program instructions that, when executed on or across one or more processors, cause the one or more processors to: detect a failed one of a plurality of head nodes of a data storage system based on a failure of the failed head node to respond to a ping; and initiate, for one or more identified volume partitions having a replica stored on the failed head node, generation of one or more replacement replicas for the one or more identified volume partitions, wherein the one or more replacement replicas are implemented on one or more respective remaining head nodes of the data storage system that were not detected to be failed, and wherein volume metadata for the replacement replicas is re-mirrored from one or more of respective copies of metadata portions stored on one or more mass storage devices of the data storage system. 18. The non-transitory computer-readable media of claim 17 , wherein the program instructions, when exec
Management of space entities, e.g. partitions, extents, pools · CPC title
in relation to availability · CPC title
by allocating resources to storage systems · CPC title
Replication mechanisms · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.