Distributing data on distributed storage systems

US10678647B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10678647-B2
Application numberUS-201916392904-A
CountryUS
Kind codeB2
Filing dateApr 24, 2019
Priority dateDec 5, 2013
Publication dateJun 9, 2020
Grant dateJun 9, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or more maintenance units. Each maintenance unit has an active state and an inactive state. Moreover, each storage device is associated with a maintenance unit. The determining of the distribution of the chunks includes identifying a random selection of the storage devices matching a number of chunks of the file and being capable of maintaining accessibility of the file when one or more maintenance units are in an inactive state. The method also includes distributing the chunks to storage devices of the distributed storage system according to the determined distribution.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of distributing data in a distributed storage system, the method comprising: receiving a file at data processing hardware; dividing, by the data processing hardware, the received file into a number of chunks; determining, by the data processing hardware, whether the data processing hardware is capable of reconstructing the file from a set of storage devices of the distributed storage system selected as storage destinations for the number of chunks when at least one storage device in the set of storage devices is inaccessible, the at least one storage device in the set of storage devices is inaccessible when the corresponding at least one storage device is affected by a maintenance event; and when the data processing hardware is capable of reconstructing the file from the set of storage devices, distributing, by the data processing hardware, the number of chunks across the set of storage devices of the distributed system to maintain accessibility of the file when the at least one storage device in the set of storage devices is inaccessible. 2. The method of claim 1 , further comprising restricting a maximum number of chunks distributed to any one storage device in the set of storage devices. 3. The method of claim 1 , wherein the set of storage devices of the distributed storage system are selected as storage destinations for the number of chunks by selecting a number of storage devices matching the number of chunks of the file. 4. The method of claim 3 , further comprising, when the data processing hardware is incapable of reconstructing the file from the selected number of storage devices, selecting, by the data processing hardware, another number of storage devices matching the number of chunks of the file. 5. The method of claim 1 , further comprising, when the data processing hardware is incapable of reconstructing the file from the set of storage devices, modifying, by the data processing hardware, the set of storage devices by adding and removing one or more storage devices. 6. The method of claim 1 , wherein the set of storage devices of the distributed storage system are selected as storage destinations for the number of chunks by using a simple sampling, a probability sampling, a stratified sampling, or a cluster sampling. 7. The method of claim 1 , wherein the set of storage devices of the distributed storage system are selected as storage destinations for the number of chunks by selecting a consecutive number of storage devices equal to a number of chunks of the file from an ordered circular list of a plurality of storage devices of the distributed storage system. 8. The method of claim 7 , further comprising, when the data processing hardware is incapable of reconstructing the file from the selected number of consecutive storage devices, selecting, by the data processing hardware, another consecutive number of storage devices from the ordered circular list equal to the number of chunks of the file. 9. The method of claim 1 , wherein the maintenance event affecting the corresponding at least one storage device in the set of storage devices comprises a power maintenance event or a network maintenance event. 10. The method of claim 1 , wherein the corresponding at least one storage device in the set of storage devices is affected by the maintenance event when the corresponding at least one storage device is undergoing maintenance. 11. The method of claim 1 , wherein the corresponding at least one storage device in the set of storage devices is affected by the maintenance event when the corresponding at least one storage device depends from a component in the distributed storage system undergoing maintenance. 12. The method of claim 1 , wherein dividing the received file into the number of chunks comprises: dividing the received file into stripes; and creating the number of chunks as stripe replicas by replicating each of the stripes. 13. A system for distributing data in a distributed storage system, the system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: receiving a file; dividing the received file into a number of chunks; determining whether the data processing hardware is capable of reconstructing the file from a set of storage devices of the distributed storage system selected as storage destinations for the number of chunks when at least one storage device in the set of storage devices is inaccessible, the at least one storage device in the set of storage devices is inaccessible when the corresponding at least one storage device is affected by a maintenance event; and when the data processing hardware is capable of reconstructing the file from the set of storage devices, distributing the number of chunks across the set of storage devices of the distributed system to maintain accessibility of the file when the at least one storage device in the set of storage devices is inaccessible. 14. The system of claim 13 , wherein the operations further comprise restricting a maximum number of chunks distributed to any one storage device in the set of storage devices. 15. The system of claim 13 , wherein the set of storage devices of the distributed storage system are selected as storage destinations for the number of chunks by selecting a number of storage devices matching the number of chunks of the file. 16. The system of claim 15 , wherein the operations further comprise, when the data processing hardware is incapable of reconstructing the file from the selected number of storage devices, selecting another number of storage devices matching the number of chunks of the file. 17. The system of claim 13 , wherein the operations further comprise, when the data processing hardware is incapable of reconstructing the file from the set of storage devices, modifying the set of storage devices by adding and removing one or more storage devices. 18. The system of claim 13 , wherein the set of storage devices of the distributed storage system are selected as storage destinations for the number of chunks by using a simple sampling, a probability sampling, a stratified sampling, or a cluster sampling. 19. The system of claim 13 , wherein the set of storage devices of the distributed storage system are selected as storage destinations for the number of chunks by selecting a consecutive number of storage devices equal to a number of chunks of the file from an ordered circular list of a plurality of storage devices of the distributed storage system. 20. The system of claim 19 , wherein the operations further comprise, when the data processing hardware is incapable of reconstructing the file from the selected number of consecutive storage devices, selecting another consecutive number of storage devices from the ordered circular list equal to the number of chunks of the file. 21. The system of claim 13 , wherein the maintenance event affecting the corresponding at least one storage device in the set of storage devices comprises a power maintenance event or a network maintenance event. 22. The system of claim 13 , wherein the corresponding at least one storage device in the set of storage devices is affected by the maintenance event when the corresponding at least one storage device is undergoing maintenance. 23. The

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • G06F16/182Primary

    Distributed file systems · CPC title

  • using file system or storage system metadata · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10678647B2 cover?
A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or m…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/182. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 09 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).