Redistributing data in a distributed storage system based on attributes of the data

US9529540B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9529540-B1
Application numberUS-201514950461-A
CountryUS
Kind codeB1
Filing dateNov 24, 2015
Priority dateNov 1, 2012
Publication dateDec 27, 2016
Grant dateDec 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Accesses to a number of data blocks stored in a distributed storage are observed. Following observation of the accesses, the stored data blocks are redistributed. In one aspect, redistribution of the data blocks includes determining the access patterns for one or more of the data blocks based on the observed accesses, and determining the storage sizes for the one or more data blocks. Thereafter, based on the determined access patterns and determined storage sizes, the one or more data blocks are sorted. Subsequently, the one or more data blocks are redistributed or rebalanced across a number of storage devices of the distributed storage based on the sorting. In one aspect, the one or more data blocks are redistributed according to either a uniform distribution scheme or a proportional distribution scheme.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: sorting a plurality of data blocks based at least in part on an access pattern and a size corresponding to respective data blocks of the plurality of data blocks, the sorting comprising: assigning the plurality of data blocks to a plurality of buckets, each bucket associated with a particular respective access pattern level and respective data block storage size requirement, wherein assigning the plurality of data blocks to the plurality of buckets comprises: matching an access pattern and a storage size of a particular data block to an access pattern level and a data block storage requirement of a particular bucket from the plurality of buckets; and assigning the particular data block to the particular bucket based on the matching; and redistributing the plurality of data blocks across a plurality of storage devices of a distributed storage based on the sorting of the plurality of data blocks, the redistributing comprising: determining a total number of data blocks assigned to a particular bucket from the plurality of buckets; calculating a target number of data blocks for each of the plurality of storage devices for the particular bucket by dividing the determined total number of data blocks by a number of the plurality of storage devices; and redistributing the data blocks assigned to the particular bucket across the plurality of storage devices based on the calculated target number of data blocks for each of the plurality of storage devices for the particular bucket. 2. The method of claim 1 , wherein redistributing the plurality of data blocks across the plurality of storage devices based on the sorting comprises uniformly redistributing data blocks having similar access patterns and storage sizes across the plurality of storage devices. 3. The method of claim 1 , wherein redistributing the plurality of data blocks across the plurality of storage devices is further based on performance characteristics for the plurality of storage devices. 4. The method of claim 3 , wherein redistributing the plurality of data blocks across the plurality of storage devices based on performance characteristics for the plurality of storage devices comprises redistributing the one or more data blocks across the plurality of storage devices in proportion to the determined performance characteristics for the plurality of storage devices. 5. The method of claim 1 , wherein redistributing the plurality of data blocks across the plurality of storage devices comprises: selecting a bucket from the plurality of buckets, the bucket having an access pattern level specifying an access time that is more recent than an access time specified by an access pattern level for another bucket from the plurality of data buckets; and redistributing data blocks assigned to the selected bucket prior to redistributing data blocks assigned to the another bucket. 6. The method of claim 1 , wherein each particular access pattern level comprises at least one of an access time range and an access count range. 7. A non-transitory computer readable storage medium executing computer program instructions, the computer program instructions comprising instructions for: sorting a plurality of data blocks based at least in part on an access pattern and a size corresponding to respective data blocks of the plurality of data blocks, the sorting comprising: assigning the plurality of data blocks to a plurality of buckets, each bucket associated with a particular respective access pattern level and respective data block storage size requirement, wherein assigning the plurality of data blocks to the plurality of buckets comprises: matching an access pattern and a storage size of a particular data block to an access pattern level and a data block storage requirement of a particular bucket from the plurality of buckets; and assigning the particular data block to the particular bucket based on the matching; and redistributing the plurality of data blocks across a plurality of storage devices of a distributed storage based on the sorting of the plurality of data blocks, the redistributing comprising: determining a total number of data blocks assigned to a particular bucket from the plurality of buckets; calculating a target number of data blocks for each of the plurality of storage devices for the particular bucket by dividing the determined total number of data blocks by a number of the plurality of storage devices; and redistributing the data blocks assigned to the particular bucket across the plurality of storage devices based on the calculated target number of data blocks for each of the plurality of storage devices for the particular bucket. 8. The medium of claim 7 , wherein redistributing the plurality of data blocks across the plurality of storage devices based on the sorting comprises uniformly redistributing data blocks having similar access patterns and storage sizes across the plurality of storage devices. 9. The medium of claim 7 , wherein redistributing the plurality of data blocks across the plurality of storage devices is further based on performance characteristics for the plurality of storage devices. 10. The medium of claim 9 , wherein redistributing the plurality of data blocks across the plurality of storage devices based on performance characteristics for the plurality of storage devices comprises redistributing the one or more data blocks across the plurality of storage devices in proportion to the determined performance characteristics for the plurality of storage devices. 11. The medium of claim 7 , wherein redistributing the plurality of data blocks across the plurality of storage devices comprises: selecting a bucket from the plurality of buckets, the bucket having an access pattern level specifying an access time that is more recent than an access time specified by an access pattern level for another bucket from the plurality of data buckets; and redistributing data blocks assigned to the selected bucket prior to redistributing data blocks assigned to the another bucket. 12. The medium of claim 7 , wherein each particular access pattern level comprises at least one of an access time range and an access count range. 13. A system comprising: a non-transitory computer readable storage medium storing processor-executable computer program instructions, the instructions comprising instructions for: sorting a plurality of data blocks based at least in part on an access pattern and a size corresponding to respective data blocks of the plurality of data blocks, the sorting comprising: assigning the plurality of data blocks to a plurality of buckets, each bucket associated with a particular respective access pattern level and respective data block storage size requirement, wherein assigning the plurality of data blocks to the plurality of buckets comprises: matching an access pattern and a storage size of a particular data block to an access pattern level and a data block storage requirement of a particular bucket from the plurality of buckets; and assigning the particular data block to the particular bucket based on the matching; and redistributing the plurality of data blocks across a plurality of storage devices of a distributed storage based on the sorting of the plurality of data blocks, the redistributing comprising: determining a total number of data blocks assigned to a particular bucket from the plurality of buckets; calculating a target number of data blocks for each of the plurality of storage devices for the particular bucket by dividing the determined total number of data blocks by a number of the plurality of sto

Assignees

Inventors

Classifications

  • G06F3/067Primary

    Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F3/0619Primary

    in relation to data integrity, e.g. data losses, bit errors · CPC title

  • at area level, e.g. provisioning of virtual or logical volumes · CPC title

  • Migration mechanisms · CPC title

  • in relation to throughput · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9529540B1 cover?
Accesses to a number of data blocks stored in a distributed storage are observed. Following observation of the accesses, the stored data blocks are redistributed. In one aspect, redistribution of the data blocks includes determining the access patterns for one or more of the data blocks based on the observed accesses, and determining the storage sizes for the one or more data blocks. Thereafter…
Who is the assignee on this patent?
Quantcast Corp
What technology area does this patent fall under?
Primary CPC classification G06F3/067. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).