Storing data and metadata in respective virtual shards on sharded storage systems

US9811546B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9811546-B1
Application numberUS-201414319301-A
CountryUS
Kind codeB1
Filing dateJun 30, 2014
Priority dateJun 30, 2014
Publication dateNov 7, 2017
Grant dateNov 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for storing data and metadata on sharded storage arrays. In one embodiment, data is processed in a sharded distributed data storage system that stores data in a plurality of shards on one or more storage nodes by providing a plurality of addressable virtual shards within each of the shards, wherein at least a first one of the addressable virtual shards stores the data, and wherein at least a second one of the addressable virtual shards stores the metadata related to the data; obtaining the data from a compute node; and providing the data and the metadata related to the data stored to the sharded distributed data storage system for storage in the respective first and second addressable virtual shards. The metadata related to the data is stored together at a portion of a corresponding stripe for the data in the second one of the addressable virtual shards. A third one of the addressable virtual shards optionally stores a checksum value related to the data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing data in a sharded distributed data storage system, wherein the sharded distributed data storage system stores said data in a plurality of shards on one or more storage nodes, said method comprising: providing a plurality of addressable virtual shards within each of said plurality of shards, wherein at least a first one of said plurality of addressable virtual shards stores said data and wherein at least a second different one of said plurality of addressable virtual shards separately stores metadata related to said data, wherein said data and said corresponding metadata related to said data are stored within said first and second addressable virtual shards, respectively, with a same object offset; obtaining, at a first burst buffer appliance, said data for a given shard from at least a second burst buffer appliance connected to said first burst buffer appliance by an interconnect network; and providing, by said first burst buffer appliance, said data for said given shard and said metadata related to said data for said given shard to said sharded distributed data storage system using a single write operation for storage in said respective first and second addressable virtual shards. 2. The method of claim 1 , wherein said metadata related to said data is generated by an application that generates said data executing on said compute node. 3. The method of claim 1 , further comprising the step of generating said metadata related to said data. 4. The method of claim 1 , wherein said metadata related to said data is stored together at a portion of a corresponding stripe for said data in the second one of said addressable virtual shards. 5. The method of claim 1 , wherein at least a third one of said plurality of addressable virtual shards stores a checksum value related to said data. 6. The method of claim 5 , wherein said data comprises a data chunk and wherein said checksum value corresponds to said data chunk. 7. The method of claim 5 , wherein said data comprises a data chunk and wherein said data chunk is further divided into a plurality of sub-chunks and wherein each of a plurality of said checksum values corresponds to one of said sub-chunks. 8. The method of claim 7 , wherein each of said plurality of checksum values corresponding to one of said sub-chunks is stored together at a portion of a corresponding stripe of the third one of said addressable virtual shards. 9. The method of claim 1 , wherein each of said plurality of addressable virtual shards within each of said plurality of shards is indexed by a unique binary value. 10. The method of claim 1 , wherein said method is implemented by said first burst buffer appliance. 11. An apparatus for processing data in a sharded distributed data storage system, wherein the sharded distributed data storage system stores said data on a plurality of shards on one or more storage nodes, said apparatus comprising: a memory; and at least one hardware device operatively coupled to the memory and configured to: provide a plurality of addressable virtual shards within each of said plurality of shards, wherein at least a first one of said plurality of addressable virtual shards stores said data and wherein at least a second different one of said plurality of addressable virtual shards separately stores metadata related to said data, wherein said data and said corresponding metadata related to said data are stored within said first and second addressable virtual shards, respectively, with a same object offset; obtain, at a first burst buffer appliance, said data for a given shard from at least a second burst buffer appliance connected to said first burst buffer appliance by an interconnect network; and provide, by said first burst buffer appliance, said data for said given shard and said metadata related to said data for said given shard to said sharded distributed data storage system using a single write operation for storage in said respective first and second addressable virtual shards. 12. The apparatus of claim 11 , wherein said metadata related to said data is generated by an application that generates said data executing on said compute node. 13. The apparatus of claim 11 , wherein said at least one hardware device is further configured to generate said metadata related to said data. 14. The apparatus of claim 11 , wherein said metadata related to said data is stored together at a portion of a corresponding stripe for said data in the second one of said addressable virtual shards. 15. The apparatus of claim 11 , wherein at least a third one of said plurality of addressable virtual shards stores a checksum value related to said data. 16. The apparatus of claim 15 , wherein said data comprises a data chunk and wherein said checksum value corresponds to said data chunk. 17. The apparatus of claim 15 , wherein said data comprises a data chunk and wherein said data chunk is further divided into a plurality of sub-chunks and wherein each of a plurality of said checksum values corresponds to one of said sub-chunks. 18. The apparatus of claim 17 , wherein each of said plurality of checksum values corresponding to one of said sub-chunks is stored together at a portion of a corresponding stripe of the third one of said addressable virtual shards. 19. The apparatus of claim 11 , wherein each of said plurality of addressable virtual shards within each of said plurality of shards is indexed by a unique binary value. 20. The apparatus of claim 11 , wherein said apparatus comprises said first burst buffer appliance. 21. An article of manufacture for processing data in a sharded distributed data storage system, wherein the sharded distributed data storage system stores said data on a plurality of shards on one or more storage nodes, said article of manufacture comprising a non-transitory machine readable recordable storage medium containing one or more programs which when executed implement the steps of: providing a plurality of addressable virtual shards within each of said plurality of shards, wherein at least a first one of said plurality of addressable virtual shards stores said data and wherein at least a second different one of said plurality of addressable virtual shards separately stores metadata related to said data, wherein said data and said corresponding metadata related to said data are stored within said first and second addressable virtual shards, respectively, with a same object offset; obtaining, at a first burst buffer appliance, said data for a given shard from at least a second burst buffer appliance connected to said first burst buffer appliance by an interconnect network; and providing, by said first burst buffer appliance, said data for said given shard and said metadata related to said data for said given shard to said sharded distributed data storage system using a single write operation for storage in said respective first and second addressable virtual shards. 22. The method of claim 1 , further comprising the step of a first one of said first and second burst buffer appliances reading said given shard using a single read operation and providing at least a portion of said given shard to a second one of said first and second burst buffer appliances. 23. The apparatus of claim 11 , wherein a first one of said first and second burst buffer appliances reads said given shard using a single read operation and providing at least a portion of said given shard to a

Assignees

Inventors

Classifications

  • Indexing structures · CPC title

  • Distributed file systems · CPC title

  • File access structures, e.g. distributed indices (arrangements of input from, or output to, record carriers G06F3/06) · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9811546B1 cover?
Techniques are provided for storing data and metadata on sharded storage arrays. In one embodiment, data is processed in a sharded distributed data storage system that stores data in a plurality of shards on one or more storage nodes by providing a plurality of addressable virtual shards within each of the shards, wherein at least a first one of the addressable virtual shards stores the data, a…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/2228. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).