Data storage system

US11301144B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11301144-B2
Application numberUS-201916457095-A
CountryUS
Kind codeB2
Filing dateJun 28, 2019
Priority dateDec 28, 2016
Publication dateApr 12, 2022
Grant dateApr 12, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data storage system includes multiple head nodes and data storage sleds. A control plane of the data storage system designates, for a volume partition, one of the head nodes to function as a primary head node storing a primary replica of the volume partition and designates two or more other head nodes to function as reserve head nodes storing reserve replicas of the volume partition. Additionally, the primary head node causes volume data for the volume partition to be erasure encoded and stored on multiple mass storage devices in different ones of the data storage sleds.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving a write request for a volume partition, by a head node of a data storage system acting as a primary head node for the volume partition; writing, by the head node, data included in the write request to a storage of the head node; causing, by the head node, the data included with the write request to be replicated from the head node to a set of two or more other head nodes of the data storage system acting as reserve head nodes for the volume partition; receiving, by the head node, a plurality of additional write requests for the volume partition and performing, for the additional write requests, said writing data included in the additional write requests to the storage of the head node and said causing data included in the additional write requests to be replicated to the set of two or more head nodes; providing an acknowledgement of the write request subsequent to the data being replicated to the two or more reserve head nodes; and erasure encoding respective parts of the data included in the write request and the additional write requests that is stored in the storage of the head node and causing the erasure encoded respective parts of the data to be stored in a plurality of mass storage devices of the data storage system, wherein the acknowledgement is provided asynchronously with respect to the respective parts of the data being erasure encoded. 2. The method of claim 1 , comprising: measuring write latencies of the set of two or more head nodes acting as the reserve head nodes for the volume partition; and in response to a write latency for one of the set of two or more head nodes exceeding a first write latency threshold: reducing a membership of the reserve head nodes required to acknowledge replication of write data to the head node acting as the primary head node before acknowledging the write request to a client of the data storage system. 3. The method of claim 1 , comprising: in response to a write latency for the one of the set of two or more head nodes exceeding a second write latency threshold or a time threshold in a reduce membership state: designating an additional head node of the data storage system as a replacement reserve head node for the volume partition; and initiating a re-mirroring operation to re-mirror volume partition data to the replacement reserve head node. 4. The method of claim 1 , wherein erasure encoding the respective parts of the data comprises: generating striped columns of the data stored in the head node acting as the primary head node for the volume partition; and generating parity columns of the data stored in the head node acting as the primary head node for the volume partition, wherein the striped columns and parity columns comprise fewer copies of the data for the volume partition than are stored in the head node acting as the primary head node for the volume partition and the set of two or more head nodes acting as the reserve head nodes for the volume partition. 5. The method of claim 4 , further comprising: receiving, by the head node acting as the primary head node for the volume partition, an indication that the head node has been designated as a replacement reserve head node for another volume partition; and replicating data stored in a storage of a remaining primary head node for the other volume partition to the storage of the head node. 6. The method of claim 1 , further comprising: receiving an indication of one or more durability requirements for the volume partition from a client of the data storage system; and adjusting a number of reserve head nodes included in the set of two or more reserve head nodes to which write data is replicated based at least in part on the received one or more durability requirements for the volume partition. 7. A data storage system, comprising: a head node of the data storage system; wherein, based, at least in part, on receiving a write request for a volume partition, the head node, when acting as a primary head node of the data storage system for the volume partition, is configured to: write data included in the write request to a storage of the head node; cause the data included with the write request to be replicated from the head node to a set of two or more other head nodes of the data storage system, wherein the two or more other head nodes are acting as reserve head nodes for the volume partition; wherein the head node, when acting as the primary head node of the data storage system for the volume partition, is further configured to: erasure encode respective parts of the data stored in the storage of the head node for the volume partition and cause the erasure encoded respective parts of the data to be stored in a plurality of respective mass storage devices of the data storage system; and provide an acknowledgement of the write request subsequent to the data being replicated to the two or more reserve head nodes, wherein the head node is configured to provide the acknowledgement prior to the respective parts of the data being erasure encoded. 8. The data storage system of claim 7 , wherein for another volume partition stored in the data storage system, the head node is configured to: receive an indication that the head node has been designated as a replacement reserve head node for the other volume partition; and replicate data stored in a storage of a remaining primary head node for the other volume partition to the storage of the head node. 9. The data storage system of claim 7 , wherein the head node is configured to implement, at least in part, a control plane for the data storage system, wherein the control plane is configured to: measure write latencies with respect to the two or more other head nodes acting as reserve head nodes for the volume partition; and in response to a write latency for one of the reserve head nodes exceeding a write latency threshold: designate an additional head node of the data storage system as a replacement head node for the head node with the write latency that exceeds the write latency threshold; and initiate a re-mirroring operation to re-mirror volume partition data to the replacement head node. 10. The data storage system of claim 7 , wherein the head node is configured to implement, at least in part, a control plane for the data storage system, wherein the control plane is configured to: receive in indication of one or more durability requirements for the volume partition from a client of the data storage service; and adjust a number of reserve head nodes included in the set of two or more reserve head nodes to which write data is replicated. 11. The data storage system of claim 10 , wherein in response to receiving the indication of the one or more durability requirements for the volume, the control plane is also configured to: adjust the erasure encoding such that the number of parts of the data stored in the storage of the head node is stored on more or fewer of the mass storage devices of the data storage system, based, at least in part, on the one or more durability requirements for the volume partition received from the client. 12. The data storage system of claim 11 , wherein the head node is configured to store another volume partition with a lower durability requirement than the volume partition, wherein for the other volume partition, the head node is configured to: write data included in a write request for the other volume partition to a storage of the head node; and cause the data included with the write request for the other volume partition to be replicated from the head node to a set of head nodes compri

Assignees

Inventors

Classifications

  • Replication mechanisms · CPC title

  • by allocating resources to storage systems · CPC title

  • G06F3/0619Primary

    in relation to data integrity, e.g. data losses, bit errors · CPC title

  • G06F3/0611Primary

    in relation to response time · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11301144B2 cover?
A data storage system includes multiple head nodes and data storage sleds. A control plane of the data storage system designates, for a volume partition, one of the head nodes to function as a primary head node storing a primary replica of the volume partition and designates two or more other head nodes to function as reserve head nodes storing reserve replicas of the volume partition. Addition…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0619. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).