Methods and apparatus for optimizing resource utilization in distributed storage systems

US9990147B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9990147-B2
Application numberUS-201514697518-A
CountryUS
Kind codeB2
Filing dateApr 27, 2015
Priority dateMar 22, 2011
Publication dateJun 5, 2018
Grant dateJun 5, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus for optimizing resource utilization in distributed storage systems. A data migration technique is described that may operate in the background in a distributed storage data center to migrate data among a fleet of storage units to achieve a substantially even and randomized data storage distribution among all storage units in the fleet. When new storage units are added to the fleet and coupled to the data center network, the new storage units are detected. Instead of processing and storing new data to the newly added storage units, as in conventional distributed storage systems, the new units are blocked from general client I/O to allow the data migration technique to migrate data from other, previously installed storage hardware in the data center onto the new storage hardware. Once the storage load on the new storage units is balanced with the rest of the fleet, the new storage units are released for general client I/O.

First claim

Opening claim text (preview).

What is claimed is: 1. A distributed storage system, comprising: a plurality of storage units configured for access by a plurality of clients and each coupled to a network, wherein the plurality of storage units collectively store data for the plurality of clients; at least one hardware processor and associated memory coupled to the network that implement a distributed storage control system configured to manage data storage across the plurality of storage units, wherein to manage the data storage the distributed storage control system is configured to: track storage space utilization among the plurality of storage units, including an aggregate storage space utilization for the plurality of storage units; based at least in part on the tracked storage space utilization, select, from among the plurality of storage units, one or more source storage units and one or more destination storage units, wherein the storage space utilization of the one or more source storage units is higher than the aggregate storage space utilization, and wherein the storage space utilization of the of the one or more destination storage units is lower than the aggregate storage space utilization; determine previously stored data on the one or more source storage units to migrate to the one or more destination storage units according to at least the tracked storage space utilization; and migrate the determined previously stored data from the one or more selected source storage units to the one or more selected destination storage units, resulting in the storage space utilization across the plurality of storage units being more evenly balanced. 2. The distributed storage system as recited in claim 1 , wherein the distributed storage control system is configured to perform said track, said select, said determine and said migrate as a background process while general client I/O traffic is performed at the plurality of storage units for the plurality of clients to read previously stored data from and store new data to the plurality of storage units via the network. 3. The distributed storage system as recited in claim 1 , wherein to track storage space utilization among the plurality of storage units, the distributed storage control system is configured to track both storage space utilization on individual storage units and the aggregate storage space utilization across the plurality of storage units. 4. The distributed storage system as recited in claim 3 , wherein to perform said select one or more source storage units and one or more destination storage units, the distributed storage control system is configured to compare the storage space utilization on individual storage units to an aggregate target based on the aggregate storage space utilization. 5. The distributed storage system as recited in claim 1 , wherein to perform said migrate the determined previously stored data, the distributed storage control system is configured to migrate data from a plurality of selected source storage units to one destination storage unit. 6. The distributed storage system as recited in claim 1 , wherein the policy further specifies one or more caps or thresholds on how much network bandwidth or processing capacity is allowed for migrating stored data from selected source storage units to destination storage units. 7. The distributed storage system as recited in claim 1 , wherein the policy further specifies one or more data type criteria to avoid overloading a particular storage unit with a particular type of data based on data object size or activity level. 8. A method, comprising: performing, by one or more computing devices: managing data storage across a plurality of storage units configured for access by a plurality of clients and coupled to a same network as the one or more computing devices, wherein the plurality of storage units collectively store data for the plurality of clients, and wherein said managing data storage access comprises: tracking storage space utilization among the plurality of storage units, including an aggregate storage space utilization for the plurality of storage units; based at least in part on the tracked storage space utilization, selecting, from among the plurality of storage units, one or more source storage units and one or more destination storage units, wherein the storage space utilization of the one or more source storage units is higher than the aggregate storage space utilization, and wherein the storage space utilization of the of the one or more destination storage units is lower than the aggregate storage space utilization; determining previously stored data on the one or more source storage units to migrate to the one or more destination storage units according to at least the tracked storage space utilization; and migrating the determined previously stored data from the one or more selected source storage units to the one or more selected destination storage units, resulting in the storage space utilization across the plurality of storage units being more evenly balanced. 9. The method of claim 8 , wherein said tracking, said selecting, said determining and said migrating are performed as part of a background process while general client I/O traffic is performed at the plurality of storage units for the plurality of clients to read previously stored data from and store new data to the plurality of storage units via the network. 10. The method of claim 8 , wherein said tracking storage space utilization among the plurality of storage units, comprises tracking both storage space utilization on individual storage units and the aggregate storage space utilization across the plurality of storage units. 11. The method of claim 10 , wherein said selecting one or more source storage units and one or more destination storage units comprises comparing the storage space utilization on individual storage units to an aggregate target based on the aggregate storage space utilization. 12. The method of claim 8 , wherein said migrating the determined previously stored data comprises migrating data from a plurality of selected source storage units to one destination storage unit. 13. The method of claim 8 , wherein the policy further specifies one or more caps or thresholds on how much network bandwidth or processing capacity is allowed for migrating stored data from selected source storage units to destination storage units. 14. The method of claim 8 , wherein data items are replicated or erasure coded among the plurality of storage units for redundancy, wherein said determining previously stored data to migrate avoids migrating redundancy data for a data item from a source storage unit to a destination storage unit already storing redundancy data for that data item. 15. A non-transitory, computer-readable storage medium, storing program instructions that when executed by one or more computing devices cause the one or more computing devices to implement a distributed storage control system that implements: managing data storage across a plurality of storage units configured for access by a plurality of clients and coupled to a same network as the distributed storage control system, wherein the plurality of storage units collectively store data for the plurality of clients, and wherein the managing data storage access comprises: tracking storage space utilization among the plurality of storage units, including an aggregate storage space utilization for the plurality of storage units; based at least in part on the tracked storage space utilization, selecting, from among the plurality of storage units, one or more source stora

Assignees

Inventors

Classifications

  • Migration mechanisms · CPC title

  • G06F3/0619Primary

    in relation to data integrity, e.g. data losses, bit errors · CPC title

  • G06F3/0611Primary

    in relation to response time · CPC title

  • at area level, e.g. provisioning of virtual or logical volumes · CPC title

  • Disk arrays, e.g. RAID, JBOD · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9990147B2 cover?
Methods and apparatus for optimizing resource utilization in distributed storage systems. A data migration technique is described that may operate in the background in a distributed storage data center to migrate data among a fleet of storage units to achieve a substantially even and randomized data storage distribution among all storage units in the fleet. When new storage units are added to t…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0619. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 05 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).