Indexing architecture for deduplicated cache system of a storage system

US8935446B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-8935446-B1
Application numberUS-201314038668-A
CountryUS
Kind codeB1
Filing dateSep 26, 2013
Priority dateSep 26, 2013
Publication dateJan 13, 2015
Grant dateJan 13, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with in response to receiving a first request for caching a first file extent associated with a first file in a cache memory device, generating a first fingerprint based on content of the first file extent. Then the method continues with searching in a fingerprint index based on the first fingerprint to determine whether the first file extent has been stored in the cache memory. In response to determining that a fingerprint entry matching the first fingerprint is found, the method then continues with associating a first identifier identifying the first file extent and the first file with a storage location of the cache memory device obtained from the matching fingerprint entry, without storing the first file extent in the cache memory device.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for indexing content stored in a cache memory device, the method comprising: in response to receiving a first request for caching a first file extent associated with a first file in a cache memory device, generating a first fingerprint based on content of the first file extent; searching in a fingerprint index based on the first fingerprint to determine whether the first file extent has been stored in the cache memory, wherein fingerprint index includes a plurality of fingerprint entries, each mapping a particular fingerprint to a storage location of the cache memory device in which a corresponding file extent is stored; and in response to determining that a fingerprint entry matching the first fingerprint is found, associating a first identifier identifying the first file extent and the first file with a storage location of the cache memory device obtained from the matching fingerprint entry, without storing the first file extent in the cache memory device. 2. The method of claim 1 , further comprising: in response to determining that a fingerprint entry matching the first fingerprint is not found in the fingerprint index, storing the first file extent in a first storage location of the cache memory device; and inserting the first fingerprint in a fingerprint entry of the fingerprint index, mapping the first fingerprint to the first storage location. 3. The method of claim 2 , wherein storing the first file extent in a first storage location of the cache memory device comprises: compressing the first file extent into an write-evict unit (WEU) containing a plurality of file extents; and storing the WEU having the first file extent in the cache memory device. 4. The method of claim 3 , wherein data stored in the cache memory device is inserted and evicted in WEU units, wherein a size of the WEU is determined based on erasure characteristics of the cache memory device. 5. The method of claim 3 , wherein each fingerprint entry of the fingerprint index includes a fingerprint identifying a corresponding file extent, a WEU identifier identifying a corresponding WEU, and an offset within the corresponding WEU at which the corresponding file extent is located. 6. The method of claim 1 , wherein associating a first identifier with a storage location comprises inserting a file entry in a file index, the file entry having the first identifier mapped to the storage location, wherein the file index is utilized to access the deduplicated file extents stored in the cache memory device. 7. The method of claim 6 , further comprising: in response to receiving a second request to read a second file extent of a second file, searching in the file index based on a second identifier identifying the second file extent and the second file; and retrieving the second file extent from the cache memory device from a second storage location that is indicated in a corresponding file entry that matches the second identifier. 8. The method of claim 7 , wherein the second request comprises a file handle of the second file and an offset within the second file in which the second file extent is located, and wherein the second identifier is generated by hashing, using a first hash function of the file handle of the second file and the offset of the second data block. 9. The method of claim 8 , further comprising: hashing, using a second hash function, the file handle of the second file and the offset of the second data block to generate an alternative second identifier; and verifying a file entry corresponding to the second data block by matching the alternative second identifier to a previously recorded alternative second identifier stored in the file entry. 10. The method of claim 8 , further comprising verifying a file entry corresponding to the second file extent by matching the file handle of the second file and the offset of the second file extent against corresponding ones stored in the file entry. 11. The method of claim 8 , further comprising verifying the second file extent by comparing the file handle of the second file and the offset against corresponding ones stored in a header of the second file extent. 12. The method of claim 8 , further comprising verifying the second file extent by comparing a generation identifier of the second file and the offset against corresponding one stored in a header of the second file extent. 13. The method of claim 1 , further comprising: maintaining a dirty list having a plurality dirty entries, each corresponding to a dirty file extent that has been cached in the cache memory device, but has not been stored in the persistent storage device of the storage system; in response to receiving a first file extent to be stored in the persistent storage device, caching the first file extent in the cache memory device temporarily without writing the first file extent to the persistent storage device; and indicating in a first of the dirty entries of the dirty list corresponding to the first file extent that the first file extent is dirty. 14. The method of claim 13 , further comprising: in response to receiving a request to evict a second of the dirty file extents identified in the dirty list, writing the second dirty file extent from the cache memory device to the persistent storage device; indicating in a second of the dirty entries of the dirty list corresponding to the second dirty file extent that the second dirty file extent is no longer dirty; and releasing the second dirty entry from the dirty list. 15. A storage system, comprising: one or more storage units to store a plurality of files; a cache memory device to cache at least some data blocks of at least some of the files; a file manager executed by a processor to provide an interface to access the plurality of files stored in the one or more storage units; and a cache manager executed by the processor configured to generate a first fingerprint based on content of a first file extent, in response to receiving a first request for caching the first file extent associated with the first file; the cache manager further configured to search in a fingerprint index based on the first fingerprint to determine whether the first file extent has been stored in the cache memory, wherein fingerprint index includes a plurality of fingerprint entries, each mapping a particular fingerprint to a storage location of the cache memory device in which a corresponding file extent is stored; and in response to determining that a fingerprint entry matching the first fingerprint is found, the cache manager further configured to associate a first identifier identifying the first file extent and the first file with a storage location of the cache memory device obtained from the matching fingerprint entry, without storing the first file extent in the cache memory device. 16. The storage system of claim 15 , where the cache manager is further configured to: in response to determining that a fingerprint entry matching the first fingerprint is not found in the fingerprint index, store the first file extent in a first storage location of the cache memory device; and insert the first fingerprint in a fingerprint entry of the fingerprint index, mapping the first fingerprint to the first storage location. 17. The storage system of claim 16 , wherein storing the first file extent in a first storage location of the cache memory device comprises: compressing the first file extent into an write-evict unit (WEU) containing a plurality of file extents; and storing the WEU

Assignees

Inventors

Classifications

  • using replacement algorithms · CPC title

  • using clearing, invalidating or resetting means · CPC title

  • Allocation or management of cache space · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8935446B1 cover?
A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with in response to receiving a first request for caching a first file extent associated with a first file in a cache memory device, generating a first fingerprint based on content of the first file extent. Then the method continues with searching in a fingerprint index based on th…
Who is the assignee on this patent?
Emc Corp
What technology area does this patent fall under?
Primary CPC classification G06F12/0891. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 13 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).