Data storage system with multiple durability levels

US11467732B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11467732-B2
Application numberUS-201916723391-A
CountryUS
Kind codeB2
Filing dateDec 20, 2019
Priority dateDec 28, 2016
Publication dateOct 11, 2022
Grant dateOct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data storage system includes multiple head nodes and multiple data storage sleds mounted in a rack. For a particular volume or volume partition one of the head nodes is designated as a primary head node for the volume or volume partition. The primary head node is configured to store data for the volume in a data storage of the primary head node and cause the data to be replicated to a secondary head node. The primary head node is also configured to cause the data for the volume to be stored in a plurality of respective mass storage devices each in different ones of the plurality of data storage sleds of the data storage system.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more non-transitory computer readable media storing program instructions, wherein the program instructions when executed on or across one or more processors implement: a head node, wherein the head node is configured to: serve, when designated, as a primary head node for a volume partition; and serve, when designated, as a secondary head node for the volume partition; wherein, when designated as a secondary head node for the volume partition, the head node is configured to: store write data, included with incoming write requests directed to another head node designated as a primary node for the volume partition, as replicated write data wherein the replicated write data is stored to a log-structured storage of the head node designated as the secondary head node such that the write data is stored in respective log-structured storages of both the other head node and the head node; send an acknowledgement to the other head node designated at the primary head node that the write data has been replicated; release, subsequent to respective parts of the write data stored in the log-structured storage of the primary head node being stored in mass storage devices, storage space of the log-structured storage of the head node designated as the secondary head node; wherein, when designated as a primary head node for the volume partition, the head node is configured to: replicate, in response to a failure of one or more of the mass storage device, data corresponding to data stored on the failed one or more mass storage devices to one or more other mass storage devices; update an index of the head node designated as the primary head node to indicate new locations of the data for the volume partition; and locate, based on the updated index, in response to a request targeting the volume partition, data of the volume partition stored in the head node and the one or more other mass storage devices storing the data that has been replicated. 2. The one or more non-transitory computer readable media of claim 1 , wherein: the log-structured storage of the head node designated as the secondary head node comprises an index with pointers to where data is stored; to perform said store the replicated write data, the program instructions when executed cause the one or more processors to store a replicated version of the write data to a head of a log of the head node designated as the secondary head node's log-structured storage; and the program instructions when executed cause the one or more processors cause the head node designated as the secondary head node to add, to the index, an entry that indicates where the written data is stored in the log of the log-structured storage. 3. The one or more non-transitory computer readable media of claim 1 , wherein the program instructions when executed cause the head node designated as the secondary head node to: write incoming replicated write requests to a head of the log of the head node designated as the secondary head node's log-structured storage; add an index entry that indicates where the written data is stored in the head node designated as the secondary head node's log; and update the index when the storage space corresponding to the replicated write data is released. 4. The one or more non-transitory computer readable media of claim 1 , comprising program instructions, that when executed: detect a failed mass storage device in a particular data storage sled; cause extents that include columns on the failed mass storage device to be replicated to other extents that include columns on other mass storage devices in other data storage sleds; and update indexes of the head node designated as the secondary head node that includes an extent in the failed mass storage device to indicate the new locations of the data for the volume partition. 5. The one or more non-transitory computer readable media of claim 1 , comprising program instructions, that when executed: in response to a failure of the primary or secondary head node for the volume partition, replicate the data included in the write requests to an additional head node of the data storage system designated as a replacement secondary head node for the volume partition. 6. The one or more non-transitory computer readable media of claim 1 , comprising program instructions, that when executed: store the write data across at least six mass storage devices such that at least four of the mass storage devices store portions of the data, and at least two of the mass storage devices store coded data derived from the data. 7. The one or more non-transitory computer readable media of claim 1 , wherein the head node is included in a system that comprises data storage sleds each comprising a plurality of mass storage devices and a respective sled controller, and the mass storage devices that store the respective parts of the data are in different ones of the data storage sleds; and wherein the program instructions when executed: detect a failure of one or more of the mass storage devices in a particular one of the data storage sleds; and copy data stored in remaining mass storage devices of the data storage sled to mass storage devices in one or more other data storage sleds. 8. A data storage system, comprising: mass storage devices; and head nodes, wherein for a volume partition of a volume to be stored in the data storage system, a first of the head nodes is designated as a primary head node for the volume partition, and a second of the head nodes is designated as a secondary head node for the volume partition, the primary head node configured to: receive a write request from a client of the data storage system; write data included with the write request for the volume partition to a log-structured storage of the primary head node; cause the data included with the write request to be replicated to a log-structured storage of the secondary head node; provide a write acknowledgement to the client in response to the write request subsequent to the data being replicated to the log-structured storage of the secondary head node, wherein the primary head node is configured to provide the write acknowledgement to the client prior to the data being flushed to one or more of the mass storage devices; and cause, based on a trigger, respective parts of the data stored in the log-structured storage of the primary head node to be flushed to one or more of the mass storage devices. 9. The data storage system of claim 8 , wherein: the log-structured storage of the primary head node and the log-structured storage of the secondary head node comprise respective indexes with pointers to where data is stored; said write the data comprises write the data included with the request to a head of a log of the respective head node's log-structured storage; and the respective head nodes are configured to add, to the respective head node's index, an entry that indicates where the written data is stored in the respective head node's log. 10. The data storage system of claim 9 , comprising: an extent comprising columns of the mass storage devices to which the written data stored in the log-structured storage of the primary head node is flushed; wherein the primary head node is configured to update the primary head node's index when the written data is flushed from the primary head node's log to the extent comprising the columns of the mass storage devices. 11. The data storage system of claim 8 , wherein: said cause respective parts of the data stored in the log-structured storage of the primary head node to be flushed comprises cause the respective parts of the data to be era

Assignees

Inventors

Classifications

  • Storing data temporarily at an intermediate stage, e.g. caching · CPC title

  • Metadata, control data · CPC title

  • G06F3/067Primary

    Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Non-volatile memory · CPC title

  • Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11467732B2 cover?
A data storage system includes multiple head nodes and multiple data storage sleds mounted in a rack. For a particular volume or volume partition one of the head nodes is designated as a primary head node for the volume or volume partition. The primary head node is configured to store data for the volume in a data storage of the primary head node and cause the data to be replicated to a seconda…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/067. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).