Performing reconciliation on a segmented de-duplication index

US10845994B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10845994-B1
Application numberUS-201715664185-A
CountryUS
Kind codeB1
Filing dateJul 31, 2017
Priority dateJul 31, 2017
Publication dateNov 24, 2020
Grant dateNov 24, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique accesses a non-resident segment and a resident segment of a segmented de-duplication index, the resident segment being currently loaded into primary memory from secondary storage for data block de-duplication, and the non-resident segment not being currently loaded into the primary memory from the secondary storage for de-duplication. The technique further discovers that a digest of a non-resident digest entry of the non-resident segment and a digest of a resident digest entry of the resident segment are duplicates. The non-resident digest entry includes a first reference to a first location of the secondary storage that holds a first data block copy, and the resident digest entry includes a second reference to a second location of the secondary storage that holds a second data block copy. The technique further performs reconciliation that conforms the non-resident segment and the resident segment of the index to reference only data block copy.

First claim

Opening claim text (preview).

What is claimed is: 1. In data storage equipment, a method of managing a de-duplication index having segments containing digest entries that reference data blocks residing within secondary storage, the method comprising: accessing a non-resident segment of the de-duplication index and a resident segment of the de-duplication index, the resident segment being currently loaded into primary memory from the secondary storage for data block de-duplication, and the non-resident segment not being currently loaded into the primary memory from the secondary storage for data block de-duplication; in response to accessing the non-resident segment and the resident segment, discovering that (i) a digest of a non-resident digest entry of the non-resident segment and (ii) a digest of a resident digest entry of the resident segment are duplicates, the non-resident digest entry including a first reference to a first storage location of the secondary storage that holds a first copy of a particular data block, and the resident digest entry including a second reference to a second storage location of the secondary storage that holds a second copy of the particular data block; and in response to discovering that (i) the digest of the non-resident digest entry of the non-resident segment and (ii) the digest of the resident digest entry of the resident segment are duplicates, performing a reconciliation operation that conforms the non-resident segment and the resident segment of the de-duplication index to reference only one of the first copy and the second copy of the particular data block; wherein performing the reconciliation operation includes: from the de-duplication index, eliminating one of the first reference to the first storage location of the secondary storage and the second reference to the second storage location of the secondary storage; wherein the method further comprises: prior to performing the reconciliation operation, performing a resident segment evaluation operation to determine whether the resident segment is open to receiving new digest entries or not open to receiving new digest entries; wherein eliminating includes: deleting the resident digest entry from the resident segment when a result of the resident segment evaluation operation indicates that the resident segment is open to receiving new digest entries, and replacing, within the non-resident digest entry, the first reference to the first storage location with the second reference to the second storage location when the result of the resident segment evaluation operation indicates that the resident segment is not open to receiving new digest entries. 2. A method as in claim 1 wherein discovering includes: performing a comparison operation between the digest of a non-resident digest entry of the non-resident segment and the digest of a resident digest entry of the resident segment to determine whether the digests are duplicates. 3. A method as in claim 2 , further comprising: prior to performing the comparison operation between the digest of the non-resident digest entry of the non-resident segment and the digest of the resident digest entry of the resident segment, filtering the digest of the non-resident digest entry against a predictive filter to predict whether the resident segment indexes a copy of the particular data block. 4. A method as in claim 1 wherein the data storage equipment is operative to perform data storage operations to write data to and read data from the secondary storage via datasets; and wherein the method further comprises: performing a de-duplication operation that shares one of the first copy and the second copy of the particular data block among different datasets. 5. A method as in claim 4 wherein initially a first dataset includes the first reference to the first storage location of the secondary storage that holds the first copy of the particular data block; wherein initially a second dataset includes the second reference to the second storage location of the secondary storage that holds the second copy of the particular data block; wherein performing the de-duplication operation includes: accessing a back reference of the first copy of the particular data block to identify the first dataset among the dataset, and upon identifying the first dataset, replacing the first reference with the second reference within the first dataset to share the second copy of the particular data block among both the first dataset and the second dataset. 6. A method as in claim 5 wherein performing the de-duplication operation further includes: in response to replacing the first reference with the second reference within the first dataset, deleting the first copy of the particular data block from the secondary storage to reclaim memory space within the secondary storage. 7. A method as in claim 5 , further comprising: in response to replacing the first reference with the second reference within the first dataset, leaving the first copy of the particular data block in place until the first copy of the particular data block is rewritten with new dataset content. 8. A method as in claim 5 , further comprising: in response to replacing the first reference with the second reference within the first dataset, leaving the first copy of the particular data block in place until the first copy of the particular data block is deleted in response to dataset deletion. 9. In data storage equipment, a method of managing a de duplication index having segments containing digest entries that reference data blocks residing within secondary storage, the method comprising: accessing a non-resident segment of the de-duplication index and a resident segment of the de-duplication index, the resident segment being currently loaded into primary memory from the secondary storage for data block de-duplication, and the non-resident segment not being currently loaded into the primary memory from the secondary storage for data block de-duplication; in response to accessing the non-resident segment and the resident segment, discovering that (i) a digest of a non-resident digest entry of the non resident segment and (ii) a digest of a resident digest entry of the resident segment are duplicates, the non-resident digest entry including a first reference to a first storage location of the secondary storage that holds a first copy of a particular data block, and the resident digest entry including a second reference to a second storage location of the secondary storage that holds a second copy of the particular data block; and in response to discovering that (i) the digest of the non-resident digest entry of the non resident segment and (ii) the digest of the resident digest entry of the resident segment are duplicates, performing a reconciliation operation that conforms the non-resident segment and the resident segment of the de-duplication index to reference only one of the first copy and the second copy of the particular data block; wherein performing the reconciliation operation includes: from the de-duplication index, eliminating one of the first reference to the first storage location of the secondary storage and the second reference to the second storage location of the secondary storage; wherein the method further comprises: prior to performing the reconciliation operation, performing a resident segment evaluation operation to determine whether the resident segment is open to receiving new digest entries or not open to receiving new digest entries; wherein a result of the resident segment evaluation operation indicates that the resident segment is open to receiving new digest entries; and wherein eliminating includes: in response to the result of the resi

Assignees

Inventors

Classifications

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket · CPC title

  • De-duplication techniques · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10845994B1 cover?
A technique accesses a non-resident segment and a resident segment of a segmented de-duplication index, the resident segment being currently loaded into primary memory from secondary storage for data block de-duplication, and the non-resident segment not being currently loaded into the primary memory from the secondary storage for de-duplication. The technique further discovers that a digest of…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 24 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).