Data storage system with enforced fencing

US11444641B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11444641-B2
Application numberUS-201916684992-A
CountryUS
Kind codeB2
Filing dateNov 15, 2019
Priority dateDec 28, 2016
Publication dateSep 13, 2022
Grant dateSep 13, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data storage system includes multiple head nodes and data storage sleds. The data storage sleds include multiple mass storage devices and a sled controller. Respective ones of the head nodes are configured to obtain credentials for accessing particular portions of the mass storage devices of the data storage sleds. A sled controller of a data storage sled determines whether a head node attempting to perform a write on a mass storage device of a data storage sled that includes the sled controller is presenting with the write request a valid credential for accessing the mass storage devices of the data storage sled. If the credentials are valid, the sled controller causes the write to be performed and if the credentials are invalid, the sled controller returns a message to the head node indicating that it has been fenced off from the mass storage device.

First claim

Opening claim text (preview).

What is claimed is: 1. A data storage system, comprising: a plurality of head nodes; and a plurality of data storage sleds comprising mass storage devices, wherein respective ones of the head nodes comprise: a network interface connected to a slot of a motherboard of the respective head node, wherein the network interfaces of the motherboards of the respective ones of the head nodes cause the mass storage devices included in the plurality of data storage sleds to appear to the motherboards of the respective ones of the head nodes as a single local storage drive; wherein a first head node of the plurality of head nodes, when acting as a primary head node for a volume partition stored by the data storage system, is configured to: replicate data for the volume partition to a second head node of the data storage system, wherein the second head node acts as a secondary head node for the volume partition stored by the data storage system; and perform a flush operation that causes data for the volume partition stored by the first head node to be written to particular portions of the mass storage devices of the data storage sleds that have been allocated to store data for the volume partition, wherein the particular portions are located in a plurality of different ones of the data storage sleds and appear to the first head node as a local storage drive, and wherein, in response to a failure of the first head node, the data storage system is configured to: issue an updated credential to the second head node of the plurality of head nodes; and fence off the first head node or other head nodes of the data storage system with credentials inferior to the updated credential from being able to cause data to be written to the particular portions of the mass storage devices of the data storage sleds that have been allocated to store data for the volume partition. 2. The data storage system of claim 1 , wherein the data storage system comprises: a local control plane, wherein the local control is configured to: issue the updated credential to the second head node in response to the failure of the first head node, wherein issuing the updated credential to the second head node causes the second head node to be promoted from acting as the secondary head node for the volume partition to acting as the primary head node for the volume partition. 3. The data storage system of claim 2 , wherein respective ones of the plurality of data storage sleds further comprise: a sled controller for the mass storage devices of the respective data storage sled, and wherein the second head node is configured to: present the updated credential to respective sled controllers of the data storage sleds that include the mass storage devices that include the particular portions allocated for the volume partition; and receive one or more tokens from the respective sled controllers for use in accessing the particular portions of the mass storage devices allocated for the volume partition. 4. The data storage system of claim 3 , wherein the respective sled controllers of the data storage sleds are configured to: issue a token to a head node presenting a credential that is superior to all previous credentials presented to the sled controller; and decline to grant access to a head node presenting a superseded token or a superseded credential. 5. The data storage system of claim 1 , wherein the network interfaces of the respective head nodes are connected to a peripheral component interconnect express (PCIe) slot of a motherboard of the respective head nodes, wherein the networking interface causes the mass storage devices of the data storage sleds to appear to the motherboard of the respective head node as a local storage drive. 6. The data storage system of claim 5 , wherein the respective ones of the head nodes are configured to communicate with the sled controllers of the respective data storage sleds using a non-volatile memory express (NVMe) protocol. 7. The data storage system of claim 1 , wherein the different ones of the data storage sleds are located in different fault domains of the data storage system. 8. The data storage system of claim 7 , wherein the data storage system is configured to: in response to a failure of a first mass storage device storing data for the volume partition in a first fault domain: replicate an at least partial copy of the data for the volume partition, stored on a remaining second mass storage device in a second fault domain, to a third mass storage device located in a third fault domain, wherein the third fault domain is a different fault domain than the first and second fault domains. 9. The data storage system of claim 8 , wherein respective ones of the plurality of data storage sleds further comprise: a sled controller for the mass storage devices of the respective data storage sled, wherein respective ones of the sled controllers are configured to: store a record of a token issued to a head node in a volatile memory of the respective sled controller; and cause a latest credential associated with a volume partition to be stored in a persistent storage of the mass storage devices allocated for the volume partition. 10. The data storage system of claim 1 , wherein the data storage system comprises: a local control plane configured to provide a sandbox recommendation to the first head node acting as the primary head node for the volume partition, wherein the sandbox recommendation indicates portions of mass storage devices in different fault domains that are recommended to be allocated for the volume partition, wherein the sandbox recommendation takes into account respective loads for other volume partitions stored in the data storage system, and wherein to perform the flush operation, the first head node acting as the primary head node for the volume partition is configured to select the particular portions of the mass storage devices to be allocated for the volume partition based on the sandbox recommendation. 11. The data storage system of claim 10 , wherein to perform the flush operation the first head node acting as the primary head node is configured to: select one or more portions of mass storage devices outside of the sandbox recommendation if sufficient space is not available on the portions of the mass storage devices included in the sandbox recommendation. 12. A data storage device, comprising: one or more processors; and a memory storing program instructions that when executed on or across the one or more processors, cause the one or more processors to: perform, when acting as a primary head node for a volume partition, a flush operation that causes data stored on a first head node acting as the primary head node to be written from the first head node to particular portions of mass storage devices of a data storage system, wherein performing the flush operation comprises presenting a credential or tokens issued to the first head node for accessing the particular portions of the mass storage devices; receive, when acting as the primary head node for the volume partition, in response to attempting to perform an additional flush operation, an indication that the tokens or credentials issued to the first head node have been superseded by tokens or credentials issued to another head node of the data storage system; and in response to receiving the indication, assume a role of secondary head node for the volume partition, wherein data written to the volume partition is replicated to the first head node from the other head node that superseded the first head node as the primary head node. 13. The data storage dev

Assignees

Inventors

Classifications

  • G06F3/067Primary

    Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title

  • Parity data used in redundant arrays of independent storages, e.g. in RAID systems · CPC title

  • H03M13/154Primary

    Error and erasure correction, e.g. by using the error and erasure locator or Forney polynomial · CPC title

  • Replication mechanisms · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11444641B2 cover?
A data storage system includes multiple head nodes and data storage sleds. The data storage sleds include multiple mass storage devices and a sled controller. Respective ones of the head nodes are configured to obtain credentials for accessing particular portions of the mass storage devices of the data storage sleds. A sled controller of a data storage sled determines whether a head node attemp…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/067. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).