Applying deduplication digests to avoid same-data writes

US11216199B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11216199-B2
Application numberUS-201816176756-A
CountryUS
Kind codeB2
Filing dateOct 31, 2018
Priority dateOct 31, 2018
Publication dateJan 4, 2022
Grant dateJan 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique for managing write requests in a data storage system checks whether newly-arriving data match previously-stored data that have been recorded in a deduplication database. If a match is found, the technique compares mapping metadata for the newly-arriving data with mapping metadata for the matching data. If both sets of metadata point to the same storage location, then the newly-arriving data is a same-data write and a new write to disk is avoided.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of managing write requests in a data storage system, the method comprising: receiving an I/O (Input/Output) request that specifies a current extent of data to be written to a specified logical address; collecting mapping metadata that maps the specified logical address to a corresponding storage location; performing a dedupe-match test on the current extent, the dedupe-match test configured to (i) produce a first result in response to the current extent matching an entry in a deduplication database, and (ii) produce a second result otherwise, the deduplication database having multiple entries for respective extents of previously stored data, each entry including a reference to metadata that maps the respective extent to a respective storage location; in response to detecting that the dedupe-match test has produced the first result, performing a metadata-match test for the current extent, the metadata-match test configured to (i) produce a Match result in response to the metadata referenced by the matching entry and the mapping metadata of the current extent pointing to a same storage location, and (ii) produce a No-Match result otherwise; and in response to the metadata-match test producing the Match result, completing the I/O request without performing any write of the current extent and the mapping metadata of the current extent to persistent storage. 2. The method of claim 1 , wherein the deduplication database provides a digest for each entry, each digest computed as hash of contents of the respective extent, and wherein performing the dedupe-match test includes matching a hash of contents of the current extent with one of the digests in the deduplication database. 3. The method of claim 2 , further comprising: receiving a second I/O request that specifies a second extent of data to be written to a second logical address in the data storage system; performing the dedupe-match test on the second extent; and in response to the dedupe-match test on the second extent producing the second result, creating a second entry in the deduplication database, the second entry including a digest computed from the second extent and a reference to metadata that maps the second logical address to a corresponding storage location in the data storage system where the second extent is persistently stored. 4. The method of claim 3 , further comprising: receiving a third I/O request that specifies a third extent of data to be written to a third logical address; collecting mapping metadata that maps the third logical address to a corresponding storage location in the data storage system; performing the dedupe-match test on the third extent, the deduplication operation producing the first result by matching the third extent to the second entry in the deduplication database; and in response to confirming that both the metadata referenced by the second entry and the mapping metadata that maps the third logical address point to the same storage location, completing the third I/O request without performing any write of the third extent to persistent storage, the third logical address being equal to the second logical address. 5. The method of claim 3 , further comprising: receiving a fourth I/O request that specifies a fourth extent of data to be written to a fourth logical address; collecting mapping metadata that maps the fourth logical address to a corresponding storage location in the data storage system; performing the dedupe-match test on the fourth extent, the deduplication operation producing the first result by matching the fourth extent to the second entry in the deduplication database; and in response to detecting that the metadata referenced by the second entry and the mapping metadata that maps the fourth logical address point to different storage locations, (i) configuring metadata for mapping the fourth logical address to the storage location to which the second logical address is mapped and (ii) completing the fourth I/O request without performing any write of the fourth extent to persistent storage, the fourth logical address being different from the second logical address. 6. The method of claim 5 , further comprising: receiving a fifth I/O request that specifies a fifth extent of data to be written to a fifth logical address; collecting mapping metadata that maps the fifth logical address to a corresponding storage location in the data storage system; performing the dedupe-match test on the fifth extent, the deduplication operation producing the first result by matching the fifth extent to the second entry in the deduplication database; and in response to detecting that the metadata referenced by the second entry and the mapping metadata that maps the fifth logical address point to the same storage location, completing of the fifth I/O request without performing any write of the fifth extent to persistent storage, the fifth logical address being equal to the fourth logical address. 7. The method of claim 2 , wherein the current extent is a block-sized extent, and wherein the matching entry is provided for a block stored in the data storage system. 8. The method of claim 2 , further comprising: aggregating a set of data received in I/O (Input/Output) requests into a batch of data in a data log, the batch of data including multiple extents, each extent directed to a respective logical address in the data storage system; collecting mapping metadata that maps the logical address of each extent to a corresponding storage location in the data storage system; performing the dedupe-match test on each of the extents in the batch; for each extent for which the dedupe-match test produces the first result, performing the metadata-matching test, and for an extent for which the dedupe-match test produces the first result and the metadata-matching test produces the Match result, marking the extent for no action in a batch-flush table, the batch-flush table associating each extent in the batch with a respective action, or no action, to be performed when flushing the batch from the data log. 9. The method of claim 8 , further comprising: for an extent for which the dedupe-match test produces the first result and the metadata-matching test produces the No-Match result, marking the extent for deduplication in the batch-flush table. 10. The method of claim 8 , further comprising: for an extent for which the dedupe-match test produces the second result, marking the extent for one of (i) an overwrite or (ii) an allocating write. 11. The method of claim 8 , wherein collecting the mapping metadata includes associating each of a set of extents in the batch with a respective metadata element that uniquely identifies a storage location of the respective extent, and wherein performing the metadata-match test on an extent includes (i) comparing the respective metadata element with the metadata referenced by the matching entry for that extent and (ii) producing the Match result in response to the respective metadata element and the metadata referenced by the matching entry being identical. 12. The method of claim 2 , further comprising, in response to the metadata-match test producing the Match result, completing the I/O request without performing any updates to mapping metadata for mapping the current extent to persistent storage. 13. A data storage system, comprising control circuitry that includes a set of processing units coupled to memory, the control circuitry constructed and arranged to: receive an I/O (Input/Output) request that specifies a current extent of data to be written to a specified logical address; collect

Assignees

Inventors

Classifications

  • Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • Saving storage space on storage systems · CPC title

  • Plurality of storage devices · CPC title

  • Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11216199B2 cover?
A technique for managing write requests in a data storage system checks whether newly-arriving data match previously-stored data that have been recorded in a deduplication database. If a match is found, the technique compares mapping metadata for the newly-arriving data with mapping metadata for the matching data. If both sets of metadata point to the same storage location, then the newly-arriv…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).