Distributed data object management system

US10310943B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10310943-B2
Application numberUS-201715626070-A
CountryUS
Kind codeB2
Filing dateJun 16, 2017
Priority dateJun 16, 2017
Publication dateJun 4, 2019
Grant dateJun 4, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, methods and systems for implementing distributed data object management are provided. The distributed data object management system includes a distributed storage system having a local metadata-consensus information store in and one or more remote metadata-consensus information stores. A metadata-consensus information store is configured to store metadata-consensus information. The metadata-consensus information corresponds to erasure coded fragments of a data object and instruct on how to manage the erasure coded fragments. The distributed storage system further includes a local data store and one or more remote data stores for the erasure coded fragments. The distributed data object management system includes a distributed data object manager for operations including, interface operations, configuration operations, write operations, read operations, delete operations, garbage collection operations and failure recovery operations. The distributed data object management system is operates based on metadata paths and data paths, operating in parallel, for write operations and read operations.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for implementing distributed data object management, the system comprising: a distributed storage system comprising: a local metadata-consensus information store and one or more remote metadata-consensus information stores, wherein a metadata-consensus information store is a table store for metadata-consensus information, the metadata-consensus information corresponds to erasure coded fragments of a data object and instructs on how to manage the erasure coded fragments; a local data store and one or more remote data stores, wherein a data store is an object store that stores the erasure coded fragments of the data object, the local data store and the one or more remote data stores store the erasure coded fragments of the data object that correspond to the metadata-consensus information in the local metadata-consensus information store and the one or more remote metadata-consensus information stores, wherein corresponding metadata writes and data writes for a write operation are performed in parallel using a metadata write path and a data write path, respectively, when writing to the local metadata-consensus information store and the one or more remote metadata-consensus information stores and the local data store and the one or more remote data stores; and wherein corresponding metadata reads and data reads for a read operation are performed in parallel using a metadata read path and a data read path, respectively, when reading from the local metadata-consensus information store and the one or more remote metadata-consensus information stores and the local data store and the one or more remote data stores. 2. The system of claim 1 , wherein the metadata-consensus information comprises one or more of the following: a known committed version, wherein the known committed version operates as a hint element in write operations and read operations; a pointer to the corresponding erasure coded fragments of the data object; one or more triplets of version column instances, a triplet of version columns comprising: a highest ballot number seen; a highest accepted ballot number; and a highest accepted value. 3. The system of claim 1 , further comprising a distributed data object manager configured to execute interface operations, the interface operations comprising: providing a customer access to a storage account associated with data objects of the customer; receiving a selection of a set of data centers where erasure coded fragments of the data objects are to be allocated; and receiving a selection of an erasure coding scheme, wherein the erasure coding scheme is used to generate erasure coded fragments of the data objects, wherein, based on the erasure coding scheme, the erasure coded fragments of the data objects comprise a defined number of data fragments and a defined number of parity fragments. 4. The system of claim 1 , further comprising a distributed data object manager configured to execute configuration operations, the configuration operations comprising: accessing a data availability profile of a customer, wherein the data availability profile identifies availability parameters selected for managing availability of data objects of the customer; based on the data availability profile, determining a number of data centers for storing erasure coded fragments and metadata-consensus information of the data objects of the customer, wherein a mapping configuration for the erasure coded fragments and metadata-consensus information indicates a mapping for storing the erasure coded fragments and metadata-consensus information in the data centers; accessing an indication of a configuration change trigger to change the mapping configuration; and based on accessing the indication of the configuration change trigger, changing the mapping configuration to a new mapping configuration, the new mapping configuration is generated based at least in part on a grace period where the mapping configuration previously being used is invalidated. 5. The system of claim 1 , further comprising a distributed data object manager configured to execute failure recovery operations comprising: receiving an indication of a transient data center failure for a given data center, wherein table stores and object stores of the given data center are temporarily not accessible; initiating processing of write operations and read operations based on degraded write operations and degraded read operations, respectively, wherein a degraded write operation comprises at least two cross-data-center roundtrips; and wherein a degraded read operation comprises reading at least a parity fragment of the erasure coded fragments of the data object; receiving an indication that the given data center has recovered from the transient data center failure; and triggering an update of table stores and object stores of the given data center based on: reading the erasure coded fragments of the data object from other object stores not at the given data center; recalculating an erasure coded fragment of the data object that belongs to the object store at the given data center; and writing the erasure coded fragment of the data object to the object store. 6. The system of claim 1 , further comprising a distributed data object manager configured to execute failure recovery operations comprising: receiving an indication of a permanent data center failure for a given data center, wherein table stores and object stores for the given data center are permanently not accessible; triggering a data center configuration change to replace the given data center; and causing regeneration of lost erasure coded fragments of the data object. 7. The system of claim 1 , further comprising a distributed data object manager configured to execute delete operations comprising one of the following: trimming earlier version instances of the data object, wherein a version instance comprises stored metadata-consensus information and corresponding erasure coded fragments of a version of the data object, wherein trimming earlier version instances is based on: determining that a version instance count limit has been met; writing a new version instance of the data object; and automatically trimming an earliest version instance of the data object; and deleting a specific version instance of a data object, wherein deleting the specific version instance is based on: executing a write operation to create a delete record for the specific version instance in the local metadata-consensus information store and the one or more remote metadata-consensus information stores, wherein the delete record supports deleting metadata-consensus information for the specific version instance, and wherein the delete record operates as an indicator for the erasure coded fragments of the specific version instance to be permanently deleted using a garbage collection operation; and deleting an object instance of the data object, wherein an object instance comprises all stored metadata-consensus information and corresponding erasure coded fragments of a version of the data object, wherein deleting the object instance is based on: executing a tombstone marker write operation, wherein the tombstone marker write operation commits a tombstone marker version as a newest version operating as an indicator to delete the object instance; deleting the erasure coded fragments corresponding to the object instance; and deleting table rows of the metadata-consensus information corresponding to the object instance. 8. The system of claim 1 , further comprising a distributed data object manager configured to execute garbage collection operations comprising truncating corresponding triplet versio

Assignees

Inventors

Classifications

  • Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title

  • Degraded mode, e.g. caused by single or multiple storage removals or disk failures · CPC title

  • using code combining, i.e. using combining of codeword portions which may have been transmitted separately, e.g. Digital Fountain codes, Raptor codes or Luby Transform [LT] codes · CPC title

  • the resynchronized component or unit being a persistent storage device (re-synchronization of failed mirror storage G06F11/2082; rebuild or reconstruction of parity RAID storage G06F11/1008) · CPC title

  • Garbage collection, i.e. reclamation of unreferenced memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10310943B2 cover?
In various embodiments, methods and systems for implementing distributed data object management are provided. The distributed data object management system includes a distributed storage system having a local metadata-consensus information store in and one or more remote metadata-consensus information stores. A metadata-consensus information store is configured to store metadata-consensus infor…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/1084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 04 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).