Extent hashing technique for distributed storage architecture

US9405783B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9405783-B2
Application numberUS-201314044624-A
CountryUS
Kind codeB2
Filing dateOct 2, 2013
Priority dateOct 2, 2013
Publication dateAug 2, 2016
Grant dateAug 2, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a technique is provided for distributing data and associated metadata within a distributed storage architecture. A set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations are stored for write data of write requests organized into extents. A hash value is generated from a hash function applied to each extent. The hash value is overloaded and used for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a storage location for the extent.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a central processing unit (CPU) of a node of a cluster having additional nodes, each node coupled to one or more solid state drives (SSDs); and a memory coupled to the CPU and configured to store a set of hash tables embodying mappings of cluster-wide identifiers associated with storage locations on the SSDs for write data of write requests organized into extents, the memory further configured to store a storage input/output (I/O) stack having a plurality of layers that cooperate with components of the additional nodes to provide a distributed storage architecture of the cluster, the layers of the storage I/O stack implemented as one or more processes executable by the CPU to: generate a hash value from a hash function applied to each extent; and overload the hash value for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a SSD storage location for an extent. 2. The system of claim 1 wherein overload of the hash value is achieved by applying prime divisors to the hash value. 3. The system of claim 1 wherein the hash function includes a property of block data randomization. 4. The system of claim 1 wherein the hash function includes a property that changing of bits of the extent results in each bit independently having a same chance of changing in the hash value. 5. The system of claim 1 wherein the hash function includes a distribution property that ensures the entries of the selected hash table are substantially balanced and substantially evenly accessed. 6. The system of claim 1 wherein the remainder computation is based on modulus arithmetic. 7. The system of claim 6 wherein the modulus arithmetic comprises a remainder of the hash value modulo a number of buckets. 8. A method comprising: storing a set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations on one or more solid state drives (SSDs) of a distributed storage architecture for write data of write requests organized into extents; generating a hash value from a hash function applied to each extent; and overloading the hash value for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a SSD storage location for an extent. 9. The method of claim 8 wherein overloading the hash value further comprises: applying prime divisors to the hash value. 10. The method of claim 8 wherein the hash function includes a property of data randomization. 11. The method of claim 8 wherein the hash function includes a property that changing of bits of the extent results in each bit independently having a same chance of changing in the hash value. 12. The method of claim 8 wherein the hash function includes a distribution property that ensures the entries of the selected hash table are substantially balanced and substantially evenly accessed. 13. The method of claim 8 wherein the remainder computation is based on modulus arithmetic. 14. The method of claim 13 wherein the modulus arithmetic comprises a remainder of the hash value modulo a number of buckets. 15. A non-transitory computer readable medium including program instructions for execution on one or more processors, the program instructions when executed operable to: store a set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations on one or more solid state drives (SSDs) of a distributed storage architecture for write data of write requests organized into extents; generate a hash value from a hash function applied to each extent; and overload the hash value for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a SSD storage location for an extent. 16. The non-transitory computer readable medium of claim 15 wherein overload of the hash value is achieved by applying prime divisors to the hash value. 17. The non-transitory computer readable medium of claim 15 wherein the hash function includes a property of block data randomization. 18. The non-transitory computer readable medium of claim 15 wherein the hash function includes a property that changing of bits of the extent results in each bit independently having a same chance of changing in the hash value. 19. The non-transitory computer readable medium of claim 15 wherein the hash function includes a distribution property that ensures the entries of the selected hash table are substantially balanced and substantially evenly accessed. 20. The non-transitory computer readable medium of claim 15 wherein the remainder computation is based on modulus arithmetic.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9405783B2 cover?
In one embodiment, a technique is provided for distributing data and associated metadata within a distributed storage architecture. A set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations are stored for write data of write requests organized into extents. A hash value is generated from a hash function applied to each extent. The hash value is over…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 02 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).