Hybrid data deduplication for elastic cloud storage devices

US10789002B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10789002-B1
Application numberUS-201715790377-A
CountryUS
Kind codeB1
Filing dateOct 23, 2017
Priority dateOct 23, 2017
Publication dateSep 29, 2020
Grant dateSep 29, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Facilitating data deduplication in an elastic cloud storage environment is provided herein. A system can comprise a processor and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations. The operations can comprise facilitating a first deduplication of first data at a first storage device based on a determination that the first storage device comprises duplicated data. The operations can also comprise sending, by the system, a request for a second deduplication at a second storage device after completion of the first deduplication at the first storage device. In addition, the operations can comprise facilitating, by the system, the second deduplication of second data at the second storage device, wherein the second data comprises a copy of the duplicated data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: facilitating, by a system comprising a processor, a first removal of first data at a first storage device based on a determination that the first data is a first duplicate of second data at the first storage device; facilitating, by the system, a second removal of third data at a second storage device, wherein the third data is a second duplicate of the second data, and wherein the facilitating the second removal is based on a request to evaluate the third data; temporarily halting, by the system, the first removal and the second removal based on a detection of a collision caused by the first removal and the second removal being performed at a same time; and resuming, by the system, the first removal and the second removal after a defined amount of time after the detection of the collision. 2. The method of claim 1 , wherein the facilitating the first removal of the first data comprises facilitating an inline deduplication that replaces the first data with a reference to the second data. 3. The method of claim 1 , wherein the facilitating the second removal of the third data comprises facilitating a post-process deduplication that replaces the third data with a reference to the second data at the first storage device. 4. The method of claim 3 , wherein the first storage device and the second storage device are geographically distributed devices, and wherein the facilitating the post-process deduplication comprises aligning the post-process deduplication at a geographically distributed level with geographically distributed replication. 5. The method of claim 1 , wherein the first storage device and the second storage device are storage devices of an elastic cloud storage system. 6. The method of claim 1 , further comprising: designating, by the system, the second data at the first storage device as unchangeable data. 7. The method of claim 6 , further comprising: detecting, by the system, the collision between the first removal of the first data and the second removal of the third data, resulting in the detection of the collision, wherein the detecting is based on the second data being the unchangeable data. 8. The method of claim 1 , further comprising: maintaining, by the system and at the first storage device, a first index of first identifying information for the first data, and a second index of second identifying information for the second data; and maintaining, by the system and at the second storage device, a third index of third identifying information for the third data. 9. The method of claim 1 , wherein the facilitating the second removal is in response to: receiving, by the system, a replication request from the second storage device to replicate the third data at the first storage device; and determining, by the system, that the third data is duplicate data. 10. The method of claim 1 , further comprising: protecting, by the system, additional data portions through replication of the additional data portions with associated parent chunks, wherein the protecting the additional data portions is performed irrespective of the first removal of the first data. 11. The method of claim 1 , wherein the facilitating the first removal of the first data and the facilitating the second removal of the third data comprises facilitating an efficiency of data deduplication in a geographically distributed environment. 12. A system, comprising: a processor; and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, comprising: facilitating a first deduplication of first data at a first storage device based on a determination that the first storage device comprises duplicated data; sending a request for a second deduplication at a second storage device after completion of the first deduplication at the first storage device; facilitating the second deduplication of second data at the second storage device, wherein the second data comprises a copy of the duplicated data; temporarily halting the first deduplication and the second deduplication based on a detection of a collision caused by the first deduplication and the second deduplication being performed at a same time; and resuming the first deduplication and the second deduplication after a defined amount of time after the detection of the collision. 13. The system of claim 12 , wherein the operations further comprise: facilitating the first deduplication based on an inline deduplication that replaces the duplicated data with a reference to the first data. 14. The system of claim 12 , wherein the operations further comprise facilitating the second deduplication based on a post-process deduplication that replaces the second data with a reference to the first data at the first storage device. 15. The system of claim 12 , wherein the operations further comprise: maintaining at the first storage device, a first index of fingerprints for the first data; and maintaining at the second storage device, a second index of fingerprints for the second data. 16. The system of claim 12 , wherein the first storage device and the second storage device are storage devices of an elastic cloud storage system. 17. A non-transitory computer-readable medium comprising instructions that, in response to execution, cause a system comprising a processor to perform operations, comprising: facilitating a first removal of first data at a first device based on a determination that the first data is duplicate data of second data at the first device; facilitating a second removal of third data at a second device, wherein the third data is the duplicate data of the second data, wherein the facilitating the second removal is based on a request to remove the duplicate data, and wherein the first device was alerted to the duplicate data based on a replication request from the second device to replicate the third data at the first device; temporarily halting the first removal and the second removal based on a detection of a collision caused by the first removal and the second removal being performed at a same time; and resuming the first removal and the second removal after a defined amount of time after the detection of the collision. 18. The non-transitory computer-readable medium of claim 17 , wherein the operations further comprise: designating the second data at the first device as immutable data, wherein the immutable data is used to resolve deduplication conflicts. 19. The non-transitory computer-readable medium of claim 17 , wherein the operations further comprise: maintaining a first identification for the first data, and a second identification of the second data at the first device; and maintaining a third identification for the third data at the second device. 20. The non-transitory computer-readable medium of claim 17 , wherein the operations further comprise: facilitating the first removal of the first data using an inline deduplication that replaces the first data with a first reference to the second data; and facilitating the second removal of the third data based on a post-process deduplication that replaces the third data with a second reference to the second data at the first device.

Assignees

Inventors

Classifications

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • in relation to throughput · CPC title

  • Saving storage space on storage systems · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10789002B1 cover?
Facilitating data deduplication in an elastic cloud storage environment is provided herein. A system can comprise a processor and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations. The operations can comprise facilitating a first deduplication of first data at a first storage device based on a determination that the first sto…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).