Efficient data reads from distributed storage systems

US9514015B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9514015-B2
Application numberUS-201615079095-A
CountryUS
Kind codeB2
Filing dateMar 24, 2016
Priority dateJan 31, 2014
Publication dateDec 6, 2016
Grant dateDec 6, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of distributing data in a distributed storage system includes receiving a file into non-transitory memory and dividing the received file into chunks. The chunks are data-chunks and non-data chunks. The method also includes grouping one or more of the data chunks and one or more of the non-data chunks in a group. One or more chunks of the group is capable of being reconstructed from other chunks of the group. The method also includes distributing the chunks of the group to storage devices of the distributed storage system based on a hierarchy of the distributed storage system. The hierarchy includes maintenance domains having active and inactive states, each storage device associated with a maintenance domain, the chunks of a group are distributed across multiple maintenance domains to maintain the ability to reconstruct chunks of the group when a maintenance domain is in an inactive state.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of distributing data in a distributed storage system, the method comprising: receiving, at data processing hardware, a file; dividing, by the data processing hardware, the received file into chunks, the chunks being data-chunks and non-data chunks; grouping, by the data processing hardware, one or more of the data chunks and one or more of the non-data chunks in a group, one or more of the data chunks or one or more of the non-data chunks of the group capable of being reconstructed from other chunks of the group; determining, by the data processing hardware, a distribution of the chunks of the group among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system, the maintenance hierarchy comprising hierarchical maintenance levels and maintenance domains, each maintenance domain having an active state or an inactive state, each maintenance domain spanning one or more adjacent hierarchical maintenance levels, each storage device associated with at least one maintenance domain; and distributing, by the data processing hardware, the chunks of the group to the storage devices based on the determined distribution, the chunks of the group being distributed across multiple maintenance domains to maintain an ability to reconstruct chunks of the group when a maintenance domain is in the inactive state. 2. The method of claim 1 , wherein the hierarchical maintenance levels comprise: a storage device level; a rack level, the storage device level depending from the rack level; a bus duct level, the rack level depending from the bus duct level; and a power distribution center level, the bus duct level depending from the power distribution center level, wherein each maintenance domain spans at least the storage device level. 3. The method of claim 1 , further comprising restricting, by the data processing hardware, a number of chunks of a group distributed to storage devices of any one maintenance domain. 4. The method of claim 1 , wherein the distribution comprises a random selection of storage devices matching a number of the chunks of the group capable of maintaining accessibility of the group when one or more maintenance domains are in the inactive state. 5. The method of claim 4 , wherein when the random selection of the storage devices is incapable of maintaining accessibility of the group when one or more maintenance domains are in the inactive state, modifying, by the data processing hardware, the random selection of the storage devices by adding and/or removing one or more randomly selected storage devices. 6. The method of claim 4 , further comprising determining, by the data processing hardware, the first random selection of storage devices using a simple sampling, a probability sampling, a stratified sampling, or a cluster sampling. 7. The method of claim 1 , wherein determining the distribution of the chunks of the group among the storage devices comprises selecting a consecutive number of storage devices equal to a number of chunks of the group from an ordered circular list of the storage devices of the distributed storage system. 8. The method of claim 7 , further comprising, when the selected storage devices are collectively incapable of maintaining the accessibility of the group when one or more maintenance domains are in an inactive state, selecting, by the data processing hardware, another consecutive number of storage devices from the ordered circular list equal to the number of chunks of the group. 9. The method of claim 7 , further comprising determining, by the data processing hardware, the ordered circular list of storage devices of the distributed storage system, adjacent storage devices on the ordered circular list associated with different maintenance domains. 10. The method of claim 9 , wherein a threshold number of consecutive storage devices on the ordered circular list are each at least one of: associated with different maintenance domains; or in different geographical locations. 11. A system for distributing data in a distributed storage system, the system comprising: storage devices; and a computer processor in communication with the storage devices, the computer processor configured to perform operations comprising: receiving a file; dividing the received file into chunks, the chunks being data-chunks and non-data chunks; grouping one or more of the data chunks and one or more of the non-data chunks in a group, one or more of the data chunks or one or more of the non-data chunks of the group capable of being reconstructed from other chunks of the group; determining a distribution of the chunks of the group among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system, the maintenance hierarchy comprising hierarchical maintenance levels and maintenance domains, each maintenance domain having an active state or an inactive state, each maintenance domain spanning one or more adjacent hierarchical maintenance levels, each storage device associated with at least one maintenance domain; and distributing the chunks of the group to the storage devices based on the determined distribution, the chunks of the group being distributed across multiple maintenance domains to maintain an ability to reconstruct chunks of the group when a maintenance domain is in the inactive state. 12. The system of claim 11 , wherein the hierarchical maintenance levels comprise: a storage device level; a rack level, the storage device level depending from the rack level; a bus duct level, the rack level depending from the bus duct level; and a power distribution center level, the bus duct level depending from the power distribution center level, wherein each maintenance domain spans at least the storage device level. 13. The system of claim 11 , wherein the operations further comprise restricting a number of chunks of a group distributed to storage devices of any one maintenance domain. 14. The system of claim 11 , wherein the distribution comprises a random selection of storage devices matching a number of the chunks of the group capable of maintaining accessibility of the group when one or more maintenance domains are in the inactive state. 15. The system of claim 14 , wherein when the random selection of the storage devices is incapable of maintaining accessibility of the group when one or more maintenance domains are in the inactive state, the operations further comprise modifying the random selection of the storage devices by adding and/or removing one or more randomly selected storage devices. 16. The system of claim 14 , wherein the operations further comprise determining the first random selection of storage devices using a simple sampling, a probability sampling, a stratified sampling, or a cluster sampling. 17. The system of claim 11 , wherein determining the distribution of the chunks of the group among the storage devices comprises selecting a consecutive number of storage devices equal to a number of chunks of the group from an ordered circular list of the storage devices of the distributed storage system. 18. The system of claim 17 , wherein the operations further comprise, when the selected storage devices are collectively incapable of maintaining the accessibility of the group when one or more maintenance domains are in an inactive state, selecting another consecutive number of storage devices from the ordered circular list equal to the number of chunks of the group.

Assignees

Inventors

Classifications

  • in relation to data integrity, e.g. data losses, bit errors · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Management of files · CPC title

  • Distributed, i.e. distributed RAID systems with parity · CPC title

  • Information retrieval; Database structures therefor; File system structures therefor · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9514015B2 cover?
A method of distributing data in a distributed storage system includes receiving a file into non-transitory memory and dividing the received file into chunks. The chunks are data-chunks and non-data chunks. The method also includes grouping one or more of the data chunks and one or more of the non-data chunks in a group. One or more chunks of the group is capable of being reconstructed from oth…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/2094. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 06 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).