Redistributing data in a distributed storage system based on attributes of the data

US9229657B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9229657-B1
Application numberUS-201213666709-A
CountryUS
Kind codeB1
Filing dateNov 1, 2012
Priority dateNov 1, 2012
Publication dateJan 5, 2016
Grant dateJan 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Accesses to a number of data blocks stored in a distributed storage are observed. Following observation of the accesses, the stored data blocks are redistributed. In one aspect, redistribution of the data blocks includes determining the access patterns for one or more of the data blocks based on the observed accesses, and determining the storage sizes for the one or more data blocks. Thereafter, based on the determined access patterns and determined storage sizes, the one or more data blocks are sorted. Subsequently, the one or more data blocks are redistributed or rebalanced across a number of storage devices of the distributed storage based on the sorting. In one aspect, the one or more data blocks are redistributed according to either a uniform distribution scheme or a proportional distribution scheme.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for redistributing a plurality of data blocks stored in a distributed storage, the method comprising: observing accesses to the plurality of data blocks stored in the distributed storage; and redistributing the plurality of data blocks in the distributed storage, wherein redistribution of the plurality of data blocks in the distributed storage comprises: determining access patterns for one or more data blocks from the plurality of data blocks based on the observed accesses; determining storage sizes for the one or more data blocks from the plurality of data blocks; sorting the one or more data blocks based at least in part on the determined access patterns and on the determined storage sizes, wherein the sorting comprising: assigning the one or more data blocks to a plurality of buckets each bucket associated with a particular access pattern level and data block storage size requirement, wherein assigning the one or more data blocks to the plurality of buckets comprises: matching a determined access pattern and a determined storage size of a particular data block to an access pattern level and a data block storage size requirement of a particular bucket from the plurality of buckets; and assigning the particular data block to the particular bucket based on the matching; and redistributing the one or more data blocks across a plurality of storage devices of the distributed storage based on the sorting of the one or more data blocks, wherein the redistributing comprises: determining a total number of data blocks assigned to a particular bucket from the plurality of buckets; calculating a target number of data blocks for each of the plurality of storage devices for the particular bucket by dividing the determined total number of data blocks by a number of the plurality of storage devices; and redistributing the data blocks assigned to the particular bucket across the plurality of storage devices based on the calculated target number of data blocks for each of the plurality of storage devices for the particular bucket. 2. The computer-implemented method of claim 1 , wherein redistributing the one or more data blocks across the plurality of storage devices based on the sorting comprises uniformly redistributing data blocks having similar access patterns and storage sizes across the plurality of storage devices. 3. The computer-implemented method of claim 1 , wherein redistributing the data blocks across the plurality of storage devices comprises: selecting a bucket from the plurality of buckets, the bucket having an access pattern level specifying an access time that is more recent than an access time specified by an access pattern level for another bucket from the plurality of data buckets; and redistributing data blocks assigned to the selected bucket prior to redistributing data blocks assigned to the another bucket. 4. A non-transitory computer readable storage medium executing computer program instructions for storing data based on access patterns, the computer program instructions comprising instructions for: observing accesses to the plurality of data blocks stored in the distributed storage; and redistributing the plurality of data blocks in the distributed storage, wherein redistribution of the plurality of data blocks in the distributed storage comprises: determining access patterns for one or more data blocks from the plurality of data blocks based on the observed accesses; determining storage sizes for the one or more data blocks from the plurality of data blocks; sorting the one or more data blocks based at least in part on the determined access patterns and on the determined storage sizes, wherein the sorting comprising: assigning the one or more data blocks to a plurality of buckets each bucket associated with a particular access pattern level and data block storage size requirement, wherein assigning the one or more data blocks to the plurality of buckets comprises: matching a determined access pattern and a determined storage size of a particular data block to an access pattern level and a data block storage size requirement of a particular bucket from the plurality of buckets; and assigning the particular data block to the particular bucket based on the matching; and redistributing the one or more data blocks across a plurality of storage devices of the distributed storage based on the sorting of the one or more data blocks, wherein the redistributing comprises: determining a total number of data blocks assigned to a particular bucket from the plurality of buckets; calculating a target number of data blocks for each of the plurality of storage devices for the particular bucket by dividing the determined total number of data blocks by a number of the plurality of storage devices; and redistributing the data blocks assigned to the particular bucket across the plurality of storage devices based on the calculated target number of data blocks for each of the plurality of storage devices for the particular bucket. 5. The medium of claim 4 , wherein redistributing the one or more data blocks across the plurality of storage devices based on the sorting comprises uniformly redistributing data blocks having similar access patterns and storage sizes across the plurality of storage devices. 6. The medium of claim 4 , wherein redistributing the data blocks across the plurality of storage devices comprises: selecting a bucket from the plurality of buckets, the bucket having an access pattern level specifying an access time that is more recent than an access time specified by an access pattern level for another bucket from the plurality of data buckets; and redistributing data blocks assigned to the selected bucket prior to redistributing data blocks assigned to the another bucket. 7. A system comprising: a non-transitory computer readable storage medium storing processor-executable computer program instructions for redistributing data, the instructions comprising instructions for: observing accesses to the plurality of data blocks stored in the distributed storage; and redistributing the plurality of data blocks in the distributed storage, wherein redistribution of the plurality of data blocks in the distributed storage comprises: determining access patterns for one or more data blocks from the plurality of data blocks based on the observed accesses; determining storage sizes for the one or more data blocks from the plurality of data blocks; sorting the one or more data blocks based at least in part on the determined access patterns and on the determined storage sizes, wherein the sorting comprising: assigning the one or more data blocks to a plurality of buckets each bucket associated with a particular access pattern level and data block storage size requirement, wherein assigning the one or more data blocks to the plurality of buckets comprises:  matching a determined access pattern and a determined storage size of a particular data block to an access pattern level and a data block storage size requirement of a particular bucket from the plurality of buckets; and  assigning the particular data block to the particular bucket based on the matching; and redistributing the one or more data blocks across a plurality of storage devices of the distributed storage based on the sorting of the one or more data blocks, wherein the redistributing comprises: determining a total number of data blocks assigned to a particular bucket from the plurality of buckets; calculating a target number of data blocks for each of the plurality of storage devices for the particular bucket by dividing the determined total number of data blocks by a number of the plurality of storage devices; and redi

Assignees

Inventors

Classifications

  • G06F3/067Primary

    Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Management of files · CPC title

  • Configuration or reconfiguration of storage systems · CPC title

  • Free address space management · CPC title

  • Saving storage space on storage systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9229657B1 cover?
Accesses to a number of data blocks stored in a distributed storage are observed. Following observation of the accesses, the stored data blocks are redistributed. In one aspect, redistribution of the data blocks includes determining the access patterns for one or more of the data blocks based on the observed accesses, and determining the storage sizes for the one or more data blocks. Thereafter…
Who is the assignee on this patent?
Rus Silvius V, Ovsiannikov Michael, Quantcast Corp
What technology area does this patent fall under?
Primary CPC classification G06F3/067. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).