Distributing data on distributed storage systems

US12019519B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12019519-B2
Application numberUS-202318191371-A
CountryUS
Kind codeB2
Filing dateMar 28, 2023
Priority dateDec 5, 2013
Publication dateJun 25, 2024
Grant dateJun 25, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or more maintenance units. Each maintenance unit has an active state and an inactive state. Moreover, each storage device is associated with a maintenance unit. The determining of the distribution of the chunks includes identifying a random selection of the storage devices matching a number of chunks of the file and being capable of maintaining accessibility of the file when one or more maintenance units are in an inactive state. The method also includes distributing the chunks to storage devices of the distributed storage system according to the determined distribution.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising: receiving a file comprising a replication code; dividing the file into multiple data stripes; for each respective data stripe of the multiple data stripes, replicating the respective data stripe as a corresponding replication chunk; selecting, from a plurality of storage devices of a distributed storage system, a set of storage devices as storage destinations for storing each corresponding replication chunk; determining that reconstruction of the file from the set of storage devices is possible with at least one storage device in the set of storage devices inaccessible; and based on determining that reconstruction of the file from the set of storage devices is possible with at least one storage device in the set of storage devices inaccessible, distributing each corresponding replication chunk across the set of storage devices to maintain accessibility of the file when the at least one storage device in the set of storage devices is inaccessible. 2. The computer-implemented method of claim 1 , wherein selecting the set of storage devices comprises determining a random selection of storage devices from the plurality of storage devices. 3. The computer-implemented method of claim 2 , wherein determining the random selection of storage devices uses a simple sampling, a probability sampling, a stratified sampling, or a cluster sampling. 4. The computer-implemented method of claim 1 , wherein the operations further comprise: receiving a second file comprising a second replication code; dividing the second file into multiple second data stripes; for each respective second data stripe of the multiple second data stripes, replicating the respective second data stripe as a corresponding second replication chunk; selecting, from the plurality of storage devices of the distributed storage system, a second set of storage devices as storage destinations for storing each corresponding second replication chunk; and determining that reconstruction of the second file from the second set of storage devices is not possible with at least one storage device in the second set of storage devices inaccessible. 5. The computer-implemented method of claim 4 , wherein the operations further comprise, based on determining that reconstruction of the second file from the second set of storage devices is not possible with the at least one storage device in the second set of storage devices inaccessible, selecting a different set of storage devices as the storage destinations for storing each corresponding replication chunk. 6. The computer-implemented method of claim 4 , wherein the operations further comprise, based on determining that reconstruction of the second file from the second set of storage devices is not possible with the at least one storage device in the second set of storage devices inaccessible: removing one or more storage devices from the second set of storage devices; and adding one or more new storage devices to the second set of storage devices. 7. The computer-implemented method of claim 1 , wherein each storage device in the set of storage devices is associated with a component of the distributed storage system, each component having an active state where each storage device in the set of storage devices associated with the component is accessible and an inactive state where each storage device in the set of storage devices associated with the component is inaccessible. 8. The computer-implemented method of claim 7 , each component associated with the set of storage devices transitions from the active state to the inactive state based on a maintenance event. 9. The computer-implemented method of claim 1 , wherein the at least one storage device in the set of storage devices is inaccessible during a maintenance event. 10. The computer-implemented method of claim 9 , wherein the maintenance event comprises one or more of: power maintenance; cooling maintenance; networking maintenance; or a power outage. 11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: receiving a file at the data processing hardware, the file comprising a replication code; dividing the file into multiple data stripes; for each respective data stripe of the multiple data stripes, replicating the respective data stripe as a corresponding replication chunk; selecting, from a plurality of storage devices of a distributed storage system, a set of storage devices as storage destinations for storing each corresponding replication chunk; determining that the data processing hardware is capable of reconstructing the file from the set of storage devices when at least one storage device in the set of storage devices is inaccessible; and based on determining that the data processing hardware is capable of reconstructing the file from the set of storage devices, distributing each corresponding replication chunk across the set of storage devices to maintain accessibility of the file when the at least one storage device in the set of storage devices is inaccessible. 12. The system of claim 11 , wherein selecting the set of storage devices comprises determining a random selection of storage devices from the plurality of storage devices. 13. The system of claim 12 , wherein determining the random selection of storage devices uses a simple sampling, a probability sampling, a stratified sampling, or a cluster sampling. 14. The system of claim 11 , wherein the operations further comprise: receiving a second file comprising a second replication code; dividing the second file into multiple second data stripes; for each respective second data stripe of the multiple second data stripes, replicating the respective second data stripe as a corresponding second replication chunk; selecting, from the plurality of storage devices of the distributed storage system, a second set of storage devices as storage destinations for storing each corresponding second replication chunk; and determining that reconstruction of the second file from the second set of storage devices is not possible with at least one storage device in the second set of storage devices inaccessible. 15. The system of claim 14 , wherein the operations further comprise, based on determining that reconstruction of the second file from the second set of storage devices is not possible with the at least one storage device in the second set of storage devices inaccessible, selecting a different set of storage devices as the storage destinations for storing each corresponding replication chunk. 16. The system of claim 14 , wherein the operations further comprise, based on determining that reconstruction of the second file from the second set of storage devices is not possible with the at least one storage device in the second set of storage devices inaccessible: removing one or more one storage devices from the second set of storage devices; and adding one or more new storage device to the second set of storage devices. 17. The system of claim 11 , wherein each storage device in the set of storage devices is associated with a component of the distributed storage system, each component having an active state where each storage device in the set of storage dev

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • G06F16/182Primary

    Distributed file systems · CPC title

  • using file system or storage system metadata · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12019519B2 cover?
A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or m…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/182. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).