What technology area does this patent fall under?

Primary CPC classification G06F3/0644. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data storage system with metadata check-pointing

US11941278B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11941278-B2
Application number	US-202117520537-A
Country	US
Kind code	B2
Filing date	Nov 5, 2021
Priority date	Jun 28, 2019
Publication date	Mar 26, 2024
Grant date	Mar 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data storage system includes multiple head nodes and data storage sleds. Volume data is replicated between a primary and one or more secondary head nodes for a volume partition and is further flushed to a set of mass storage devices of the data storage sleds. Volume metadata is maintained in a primary and one or more secondary head nodes for a volume partition and is updated in response to volume data being flushed to the data storage sleds. Also, the primary and secondary head nodes store check-points of volume metadata to the data storage sleds, wherein in response to a failure of a primary or secondary head node for a volume partition, a replacement secondary head node for the volume partition recreates a secondary replica for the volume partition based, at least in part, on a stored volume metadata checkpoint.

First claim

Opening claim text (preview).

What is claimed is: 1. A data storage system, comprising: a plurality of head nodes; a plurality of mass storage devices, wherein for a volume partition stored in the data storage system, a first and second head node of the plurality of head nodes are configured to: store data for a replica of the volume partition in a log-structured storage of the respective first or second head node, wherein the log-structured storage comprises a volume data portion and a metadata portion; and wherein the first head node is configured to store, to one or more of the plurality of mass storage devices, a copy of the metadata portion of the volume partition; a failure detection agent configured to: detect a failed one of the plurality of head nodes based on a failure of the failed head node to respond to a ping from the failure detection agent; and indicate to a plurality of remaining ones of the plurality of head nodes that the failed head node has failed, wherein the plurality of remaining ones of the plurality of head nodes are each configured to: identify volume partitions for which replicas are stored on the failed head node; and initiate, for the identified volume partitions, the designation of a replacement replica for the identified volume partitions on respective ones of the remaining head nodes. 2. The data storage system of claim 1 , wherein the plurality of remaining ones of the plurality of head nodes are further configured to: generate a log-structured storage for the replacement replica based on one or more copies of the metadata portions stored on the one or more mass storage devices. 3. The data storage system of claim 1 , wherein the first head node is configured to perform a metadata checkpoint operation, wherein the storing of the copy of the metadata portion of the log-structured storage to the one or more mass storage devices is part of the metadata checkpoint operation performed by the first head node, and wherein the first head node is configured to independently perform the metadata checkpoint operation, independent from performing a flush operation. 4. The data storage system of claim 1 , wherein the ping comprises: a verification that an active network connection exists to a respective head node being pinged. 5. The data storage system of claim 1 , wherein the ping comprises: a query to an operating system of a respective head node being pinged. 6. The data storage system of claim 1 , wherein the ping comprises: a set of queries directed to individual replicas stored on a respective head node being pinged. 7. The data storage system of claim 1 , wherein the ping comprises: a request for performance information directed to a respective head node being pinged, wherein a failure to provide the requested performance information is interpreted as an indication of a failure at the respective head node being pinged. 8. The data storage system of claim 1 , wherein the first head node is configured to: perform said store a copy of the metadata portion of the log-structured storage for the primary replica based on an amount of metadata stored in the first head node, but not yet copied to the mass storage devices, exceeding a threshold amount of stored but not yet copied metadata, and perform a flush operation based on an amount of volume data stored in the log-structured storage for the primary replica exceeding a threshold amount of stored volume data. 9. The data storage system of claim 8 , wherein the first head node is further configured to perform a flush operation, wherein to perform the flush operation, the first head node is configured to: read data stored for the volume partition from the volume data portion of the log-structured storage of the first head node; cause the data read from the volume data portion of the log-structured storage of the first head node to be written to a set of the mass storage devices; and update the metadata portion of the log-structured storage of the first head node to indicate one or more locations at which the data read from the volume data portion is stored on the set of mass storage devices. 10. The data storage system of claim 9 , wherein the first head node is configured to perform said storing the copy of the metadata portion, for the replica of the volume partition, to the mass storage devices independently of performing the flush operation for the replica of the volume partition. 11. A method, comprising: storing data for respective replicas of respective volume partitions in log-structured storages of respective head nodes of a data storage system, wherein the log-structured storages of the head nodes comprise a volume data portion and a metadata portion; storing, to one or more mass storage devices of the data storage system, respective copies of the metadata portions of the volume partitions; detecting a failed one of the plurality of head nodes based on a failure of the failed head node to respond to a ping; and initiating, for one or more identified volume partitions having a replica stored on the failed head node, one or more replacement replicas for the one or more identified volume partitions, wherein the one or more replacement replicas are implemented on one or more respective remaining head nodes of the data storage system that were not detected to be failed, and wherein volume metadata for the one or more replacement replicas is re-mirrored from one or more of the respective copies of the metadata portions stored on the one or more mass storage devices of the data storage system. 12. The method of claim 11 , further comprising: generating a log-structured storage for the one or more replacement replicas based on the one or more copies of the metadata portions stored on the one or more mass storage devices of the data storage system. 13. The method of claim 11 , wherein the ping comprises: a verification that an active network connection exists to a respective head node being pinged. 14. The method of claim 11 , wherein the ping comprises: a query to an operating system of a respective head node being pinged. 15. The method of claim 11 , wherein the ping comprises: a set of queries directed to individual replicas stored on a respective head node being pinged. 16. The method of claim 11 , wherein the ping comprises: a request for performance information directed to a respective head node being pinged, wherein a failure to provide the requested performance information is interpreted as an indication of a failure at the respective head node being pinged. 17. A non-transitory, computer-readable medium storing program instructions that, when executed on or across one or more processors, cause the one or more processors to: detect a failed one of a plurality of head nodes of a data storage system based on a failure of the failed head node to respond to a ping; and initiate, for one or more identified volume partitions having a replica stored on the failed head node, generation of one or more replacement replicas for the one or more identified volume partitions, wherein the one or more replacement replicas are implemented on one or more respective remaining head nodes of the data storage system that were not detected to be failed, and wherein volume metadata for the replacement replicas is re-mirrored from one or more of respective copies of metadata portions stored on one or more mass storage devices of the data storage system. 18. The non-transitory computer-readable media of claim 17 , wherein the program instructions, when exec

Assignees

Amazon Tech Inc

Inventors

Classifications

G06F3/0644Primary
Management of space entities, e.g. partitions, extents, pools · CPC title
G06F3/0617
in relation to availability · CPC title
G06F3/0631
by allocating resources to storage systems · CPC title
G06F3/065
Replication mechanisms · CPC title
G06F3/067
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

View patent family 74042868

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11941278B2 cover?: A data storage system includes multiple head nodes and data storage sleds. Volume data is replicated between a primary and one or more secondary head nodes for a volume partition and is further flushed to a set of mass storage devices of the data storage sleds. Volume metadata is maintained in a primary and one or more secondary head nodes for a volume partition and is updated in response to vo…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06F3/0644. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).