Tombstones for no longer relevant deduplication entries

US10528280B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10528280-B1
Application numberUS-201715420726-A
CountryUS
Kind codeB1
Filing dateJan 31, 2017
Priority dateJan 31, 2017
Publication dateJan 7, 2020
Grant dateJan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An implementation of the disclosure provides a system comprising a storage array comprising a plurality of data blocks and a storage controller coupled to the storage array. The storage controller comprising a processing device to identify a canonical instance of a data block in a vector associated with a deduplication map. The vector represents a plurality of updates to the deduplication map over a determined time period. A deduplication reference representing duplicate data of the data block in the storage array is select from the deduplication map. The deduplication reference is remapped in the deduplication map to point to the canonical instance. Based on the remapping, an entry in the deduplication map for the deduplication reference is updated with a record. Responsive to detecting that the entry is in a location associated with an original entry of the data block in the deduplication map, delete the entry with the record.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a storage array comprising a plurality of data blocks; and a storage controller coupled to the storage array, the storage controller comprising a processing device, the processing device to: identify a canonical instance of a data block in a vector associated with a deduplication map, the vector represents a plurality of updates to the deduplication map over a determined time period; select, from the deduplication map, a deduplication reference representing duplicate data of the data block in the storage array, wherein the canonical instance represents an earliest occurrence of the duplicate data of the data block in the vector associated with the deduplication map; remap the deduplication reference in the deduplication map to point to the canonical instance; update an entry in the deduplication map for the deduplication reference with a record based on the remapped the deduplication reference; and responsive to detecting that the entry is in a location associated with an original entry of the data block in the deduplication map, delete the entry with the record. 2. The system of claim 1 , wherein to determine the vector, the processing device is further to determine a range of sequence identifiers associated with a plurality of updates to the data block. 3. The system of claim 2 , wherein to determine whether the deduplication reference is in the vector, the processing device is further to: identify a sequence identifier for the deduplication reference; and determine whether the sequence identifier is within the range of sequence identifiers associated with the vector. 4. The system of claim 3 , wherein the processing device is further to determine whether the identify sequence identifier for the deduplication reference corresponds to a sequence identifier for the data block. 5. The system of claim 4 , wherein the processing device is further to, responsive to determining that the identified sequence identifier corresponds to a sequence identifier for the data block, delete the entry associated with the record from the deduplication map. 6. The system of claim 1 , wherein, the processing device is further to determine whether a hash value associated with the deduplication reference and the data block correspond. 7. A method comprising: identifying, by a processing device, a canonical instance of a data block associated with a deduplication map; selecting, by the processing device, a deduplication reference from the deduplication map, the deduplication reference represents duplicate data of the data block, wherein the canonical instance represents an earliest occurrence of the duplicate data of the data block in an identified vector for the deduplication map; remaping the deduplication reference in the deduplication map to point to the canonical instance; update an entry in the deduplication map for the deduplication reference with a record based on the remaping; determining, by the processing device, that a location of the entry corresponds to a vector associated with an original entry of the data block in the deduplication map, the vector represents a range of sequence identifiers associated with updates to the deduplication map; and performing, in view of the determining, a trimming process to trim the entry with the record from the deduplication map. 8. The method of claim 7 , wherein determining that the location of the deduplication reference corresponds to the vector: identifying a sequence identifier for the deduplication reference; and determining whether the sequence identifier is within the range of sequence identifiers associated with the vector. 9. The method of claim 8 , further comprising determining whether the identify sequence identifier for the deduplication reference corresponds to a sequence identifier associated with an original entry for the data block in the deduplication map. 10. The method of claim 9 , wherein responsive to determining that the identify sequence corresponds to a sequence identifier for the original entry, issuing an instruction to trim the record from the deduplication map. 11. The method of claim 7 , wherein the selecting comprises determining whether a hash value in the deduplication map for the deduplication reference and the data block correspond. 12. A non-transitory computer readable storage medium storing instructions, which when executed, cause a processing device to: select, by the processing device, a canonical instance for a data block associated with a plurality of deduplication references in a deduplication map, the deduplication references represent duplicate data of the data block in a vector of the deduplication map; remap the plurality of deduplication references in the vector to point to the canonical instance, wherein the canonical instance represents an earliest occurrence of the duplicate data of the data block in an identified vector for the deduplication map; update each entry associated with the plurality of deduplication references with a record based on remapping the plurality of deduplication references; and responsive to detecting that the location of an entry corresponds to an original entry of the data block in the deduplication map, trim entries from the deduplication map that are associated with each record. 13. The non-transitory computer readable storage medium of claim 12 , wherein the processing device is further to identify the vector associated with the plurality of deduplication references based on a sequence identifier for the canonical instance. 14. The non-transitory computer readable storage medium of claim 12 , wherein the vector comprises a range of sequence identifiers associated with updates to the data block. 15. The non-transitory computer readable storage medium of claim 14 , wherein the processing device is further to: identify a sequence identifier for each of the deduplication references; and determine whether the sequence identifier is within the range of sequence identifiers associated with the vector. 16. The non-transitory computer readable storage medium of claim 15 , wherein the processing device is further to determine whether at least one of the identify sequence identifier for each of the deduplication references corresponds to a sequence identifier for the original entry associated with the data block. 17. The non-transitory computer readable storage medium of claim 12 , wherein the processing device is further to determine whether a hash value for each of the deduplication references in the deduplication and the data block correspond.

Assignees

Inventors

Classifications

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • Saving storage space on storage systems · CPC title

  • Single storage device · CPC title

  • Garbage collection, i.e. reclamation of unreferenced memory · CPC title

  • Improving I/O performance · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10528280B1 cover?
An implementation of the disclosure provides a system comprising a storage array comprising a plurality of data blocks and a storage controller coupled to the storage array. The storage controller comprising a processing device to identify a canonical instance of a data block in a vector associated with a deduplication map. The vector represents a plurality of updates to the deduplication map o…
Who is the assignee on this patent?
Pure Storage Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).