Distributing data on distributed storage systems
US-10678647-B2 · Jun 9, 2020 · US
US11113150B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11113150-B2 |
| Application number | US-202016880513-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 21, 2020 |
| Priority date | Dec 5, 2013 |
| Publication date | Sep 7, 2021 |
| Grant date | Sep 7, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or more maintenance units. Each maintenance unit has an active state and an inactive state. Moreover, each storage device is associated with a maintenance unit. The determining of the distribution of the chunks includes identifying a random selection of the storage devices matching a number of chunks of the file and being capable of maintaining accessibility of the file when one or more maintenance units are in an inactive state. The method also includes distributing the chunks to storage devices of the distributed storage system according to the determined distribution.
Opening claim text (preview).
What is claimed is: 1. A method of distributing data in a distributed storage system, the method comprising: selecting, by the data processing hardware, a first set of storage devices as storage destinations from a plurality of storage devices of the distributed storage system for storing chunks of stripe replicas of stripes divided from a file, the first set of storage devices being in an active state when the distributed storage system is affected by a power maintenance event or a network maintenance event; determining, by the data processing hardware, whether the file is accessible from the selected first set of storage devices when the distributed storage system is affected by a power maintenance event or a network maintenance event; and when the selected first set of storage devices is incapable of maintaining accessibility of the file when the distributed storage system is affected by a power maintenance event or a network maintenance event, selecting, by the data processing hardware, a second set of storage devices as alternative storage destinations from the plurality of storage devices of the distributed storage system for storing the chunks of the stripe replicas of the stripes divided from the file. 2. The method of claim 1 , further comprising restricting a number of chunks distributed to any one storage device of the first set of storage devices. 3. The method of claim 1 , wherein selecting the first set of storage devices comprises determining a first selection of storage devices matching a number of chunks of the file. 4. The method of claim 3 , wherein selecting the second set of storage devices comprises determining a second selection of storage devices matching the number of chunks of the file. 5. The method of claim 3 , wherein selecting the second set of storage devices comprises modifying the first selection of storage devices by adding and removing one or more selected storage devices. 6. The method of claim 3 , wherein determining the first random selection of storage devices uses a simple sampling, a probability sampling, a stratified sampling, or a cluster sampling. 7. The method of claim 1 , wherein selecting the first set of storage devices comprises selecting a consecutive number of storage devices equal to a number of chunks of the file from an ordered circular list of the plurality of storage devices of the distributed storage system. 8. The method of claim 7 , wherein selecting the second set of storage devices comprises selecting another consecutive number of storage devices from the ordered circular list equal to the number of chunks of the file. 9. The method of claim 7 , further comprising determining that the ordered circular list of storage devices of the distributed storage system is adjacent storage devices on the ordered circular list in an inactive state when the distributed storage system is affected by a power maintenance event or a network maintenance event. 10. The method of claim 9 , wherein a threshold number of consecutive storage devices on the ordered circular list are each associated with the inactive state. 11. A system for distributing data in a distributed storage system, the system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: selecting a first set of storage devices as storage destinations from a plurality of storage devices of the distributed storage system for storing chunks of stripe replicas of stripes divided from a file, the first set of storage devices being in an active state when the distributed storage system is affected by a power maintenance event or a network maintenance event; determining whether the file is accessible from the selected first set of storage devices when the distributed storage system is affected by a power maintenance event or a network maintenance event; and when the selected first set of storage devices is incapable of maintaining accessibility of the file when the distributed storage system is affected by a power maintenance event or a network maintenance event, selecting a second set of storage devices as alternative storage destinations from the plurality of storage devices of the distributed storage system for storing the chunks of the stripe replicas of the stripes divided from the file. 12. The system of claim 11 , wherein the operations further comprise restricting a number of chunks distributed to any one storage device of the first set of storage devices. 13. The system of claim 11 , wherein selecting the first set of storage devices comprises determining a first selection of storage devices matching a number of chunks of the file. 14. The system of claim 13 , wherein selecting the second set of storage devices comprises determining a second selection of storage devices matching the number of chunks of the file. 15. The system of claim 13 , wherein selecting the second set of storage devices comprises modifying the first selection of storage devices by adding and removing one or more selected storage devices. 16. The system of claim 13 , wherein determining the first random selection of storage devices uses a simple sampling, a probability sampling, a stratified sampling, or a cluster sampling. 17. The system of claim 11 , wherein selecting the first set of storage devices comprises selecting a consecutive number of storage devices equal to a number of chunks of the file from an ordered circular list of the plurality of storage devices of the distributed storage system. 18. The system of claim 17 , where selecting the second set of storage devices comprises selecting another consecutive number of storage devices from the ordered circular list equal to the number of chunks of the file. 19. The system of claim 17 , wherein the operations further comprise determining that the ordered circular list of storage devices of the distributed storage system is adjacent storage devices on the ordered circular list in an inactive state when the distributed storage system is affected by a power maintenance event or a network maintenance event. 20. The system of claim 19 , wherein a threshold number of consecutive storage devices on the ordered circular list are each associated with the inactive state.
Distributed file systems · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
using file system or storage system metadata · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.