Rebuild rollback support in distributed SDS systems

US10007582B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10007582-B2
Application numberUS-201615277271-A
CountryUS
Kind codeB2
Filing dateSep 27, 2016
Priority dateSep 27, 2016
Publication dateJun 26, 2018
Grant dateJun 26, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, computing systems and computer program products implement embodiments of the present invention that include mirroring, in a distributed storage system having multiple storage nodes, data on the storage nodes. Upon the distributed storage system detecting a loss of communication with a given storage node, a log including updates to the data stored in the given storage node is recorded and, the recorded updates can be applied to the given storage node upon communication with the given storage node being reestablished. In some embodiments, the distributed storage system may be configured as a software defined storage system where the storage nodes can be implemented as either virtual machines or software containers. In additional embodiments, upon detecting the loss of communication, a redistribution of the mirrored data among remaining storage nodes is initiated upon detecting the loss of communication, and the redistribution is rolled back upon reestablishing the communication.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for rebuild rollback support in a storage environment, by a processor device, comprising: mirroring, in a distributed storage system comprising multiple storage nodes, data on the storage nodes, each of the multiple storage nodes comprising an independent computing entity in communication with the distributed storage system; detecting, by the distributed storage system, a loss of communication with a given storage node; recording a log comprising updates to the data stored in the given storage node; applying, by the distributed storage system, the recorded updates to the given storage node upon reestablishing communication with the given storage node; upon detecting the loss of communication, initiating redistribution of the mirrored data among remaining storage nodes, and rolling back the redistribution upon reestablishing communication with the given storage node; wherein the recorded updates are applied to the given storage node subsequent to the rolling back the redistribution thereby efficiently using resources of the distributed storage system by only applying the recorded updates in lieu of continuing the redistribution of the mirrored data to the remaining storage nodes upon reconnection with the given storage node; and halting recording of the log upon the redistribution reaching a specified threshold, and completing the redistribution of the mirrored data among the remaining storage nodes; wherein a management application assesses whether rolling back the redistribution and applying the recorded updates will require more resources than continuing the redistribution of the mirrored data to the remaining storage nodes by referencing the specified threshold, and the redistribution of the mirrored data among the remaining storage nodes is completed upon determining, by the management application, the specified threshold has been reached. 2. The method according to claim 1 , wherein the storage system comprises a software-defined storage system, and wherein each of the storage nodes is selected from a group consisting of a virtual machine and a software container. 3. The method according to claim 1 , wherein each of the storage nodes comprises a set of storage blocks configured to store the data, and wherein mirroring the data on a given storage node comprises distributing mirrored copies of the storage blocks to remaining storage nodes. 4. The method according to claim 1 , wherein the log comprises a specified set of resources, and further comprising halting recording of the log upon the log fully utilizing the specified set of resources, and completing the redistribution of the mirrored of the data among the remaining storage nodes. 5. The method according to claim 1 , wherein the given storage node comprises a first given storage node, wherein the log comprises a first log, wherein the updates comprise first updates, and further comprising while applying the first updates to the first given storage node, detecting, by the distributed storage system, a loss of communication with a second given storage node, recording a second log comprising second updates to the data stored in the second given storage node, and applying, by the distributed storage system, the recorded second updates to the second given storage node upon reestablishing communication with the second given storage node. 6. A distributed storage system implementing rebuild rollback support, comprising a plurality of storage nodes, each of the storage nodes having a processor device and comprising an independent computing entity in communication with the distributed storage system, the storage nodes configured: to mirror data on the storage nodes, to detect a loss of communication with a given storage node, to record a log comprising updates to the data stored in the given storage node, to apply the recorded updates to the given storage node upon reestablishing communication with the given storage node, upon detecting the loss of communication, to initiate redistribution of the mirrored data among remaining storage nodes, and to roll back the redistribution upon reestablishing communication with the given storage node; wherein the recorded updates are applied to the given storage node subsequent to the rolling back the redistribution thereby efficiently using resources of the distributed storage system by only applying the recorded updates in lieu of continuing the redistribution of the mirrored data to the remaining storage nodes upon reconnection with the given storage node; and to halt recording of the log upon the redistribution reaching a specified threshold, and to complete the redistribution of the mirrored data among the remaining storage nodes; wherein a management application assesses whether rolling back the redistribution and applying the recorded updates will require more resources than continuing the redistribution of the mirrored data to the remaining storage nodes by referencing the specified threshold, and the redistribution of the mirrored data among the remaining storage nodes is completed upon determining, by the management application, the specified threshold has been reached. 7. The distributed storage system according to claim 6 , wherein the storage nodes comprise a software-defined storage system, and wherein each of the storage nodes is selected from a group consisting of a virtual machine and a software container. 8. The distributed storage system according to claim 6 , wherein each of the storage nodes comprises a set of storage blocks configured to store the data, and wherein the distributed storage system is configured to mirror the data on a given storage node by distributing mirrored copies of the storage blocks to remaining storage nodes. 9. The distributed storage system according to claim 6 , wherein the log comprises a specified set of resources, and wherein the distributed storage system is further configured to halt recording of the log upon the log fully utilizing the specified set of resources, and to complete the redistribution of the data mirrored among the remaining storage nodes. 10. The distributed storage system according to claim 6 , wherein the given storage node comprises a first given storage node, wherein the log comprises a first log, wherein the updates comprise first updates, and while applying the first updates to the first given storage node, the distributed storage system is further configured to detect a loss of communication with a second given storage node, to record a second log second comprising second updates to the data stored in the second given storage node, and to apply the recorded second updates to the second given storage node upon reestablishing communication with the second given storage node. 11. A computer program product for rebuild rollback support in a storage environment, by a processor device, the computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code configured to mirror, in a distributed storage system comprising multiple storage nodes, data on the storage nodes, each of the multiple storage nodes comprising an independent computing entity in communication with the distributed storage system; computer readable program code configured to detect, by the distributed storage system, a loss of communication with a given storage node; computer readable program code configured to record a log comprising updates to the data stored in the given storage node; computer readable program code configured to apply, by the distributed storage system, the recorde

Assignees

Inventors

Classifications

  • using more than 2 mirrored copies · CPC title

  • Monitoring storage devices or systems · CPC title

  • Threshold · CPC title

  • Monitoring of systems including the internet · CPC title

  • Parity data used in redundant arrays of independent storages, e.g. in RAID systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10007582B2 cover?
Methods, computing systems and computer program products implement embodiments of the present invention that include mirroring, in a distributed storage system having multiple storage nodes, data on the storage nodes. Upon the distributed storage system detecting a loss of communication with a given storage node, a log including updates to the data stored in the given storage node is recorded a…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F11/1471. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 26 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).