Systems and methods for a de-duplication cache

US9824018B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9824018-B2
Application numberUS-201514834157-A
CountryUS
Kind codeB2
Filing dateAug 24, 2015
Priority dateJan 27, 2012
Publication dateNov 21, 2017
Grant dateNov 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A de-duplication is configured to cache data for access by a plurality of different storage clients, such as virtual machines. A virtual machine may comprise a virtual machine de-duplication module configured to identify data for admission into the de-duplication cache. Data admitted into the de-duplication cache may be accessible by two or more storage clients. Metadata pertaining to the contents of the de-duplication cache may be persisted and/or transferred with respective storage clients such that the storage clients may access the contents of the de-duplication cache after rebooting, being power cycled, and/or being transferred between hosts.

First claim

Opening claim text (preview).

We claim: 1. An apparatus, comprising: a driver configured to monitor requests within an input/output (I/O) stack of a virtual machine; and a cache manager configured for operation within the virtual machine, the cache manager to service a first request, of the monitored requests, using a de-duplication cache in response to associating the first request with a data identifier in cache metadata maintained within the virtual machine, the data identifier corresponding to data admitted into the de-duplication cache by the virtual machine, wherein: to service the first request, the cache manager sends the data identifier from the virtual machine to the de-duplication cache and the cache manager comprises one or more of a circuit, programmable logic, firmware, and instructions stored on a non-transitory storage medium. 2. The apparatus of claim 1 , wherein, to admit a file into the de-duplication cache from the virtual machine, the cache manager is configured to: derive a data identifier from data of the file; and send an admission request from the virtual machine to the de-duplication cache, the admission request comprising the derived data identifier. 3. The apparatus of claim 2 , wherein the cache manager is further configured to record an association between the file and the derived data identifier in the cache metadata maintained within the virtual machine in response to the data of the file being admitted into the de-duplication cache. 4. The apparatus of claim 2 , wherein the cache manager is configured to derive the data identifier by one or more of: hashing, digesting, and computing a signature of the data of the file. 5. The apparatus of claim 2 , wherein the de-duplication cache is configured to admit the file into the de-duplication cache in response to determining that the file is not associated with a data identifier in the cache metadata of the virtual machine. 6. The apparatus of claim 2 , wherein the cache manager is configured to admit the file into the de-duplication cache in response to determining that the file satisfies a de-duplication policy. 7. The apparatus of claim 6 , wherein determining that the file satisfies the de-duplication policy comprises comparing one or more of a name, an extension, a path, a volume, an attribute, and a hint associated to a file selection criterion. 8. The apparatus of claim 2 , wherein: to admit the file into the de-duplication cache, the cache manager is further configured to access the file data by use of the I/O stack of the virtual machine; and the admission request comprises the file data. 9. An apparatus, comprising: a de-duplication manager configured for operation within a virtual machine hosted on a computing device, the de-duplication manager to identify I/O requests of the virtual machine pertaining to files that qualify for admission into a de-duplication cache shared by two or more virtual machines hosted on the computing device; and a de-duplication cache interface configured for operation within the virtual machine, the de-duplication cache interface to service the identified I/O requests using the de-duplication cache, the de-duplication manager comprising one or more of a circuit, programmable logic, and instructions stored on a non-transitory storage medium. 10. The apparatus of claim 9 , wherein the de-duplication manager is configured to admit a file into the de-duplication cache by deriving a data identifier from data of the file at the virtual machine, and providing the data of the file and the derived data identifier to the de-duplication cache by use of the de-duplication cache interface. 11. The apparatus of claim 10 , wherein: the de-duplication manager is configured to admit the file into the de-duplication cache in response to an I/O request pertaining to the file; and operations to admit the file into the de-duplication cache are performed on a separate thread from a thread performing operations to service the I/O request. 12. The apparatus of claim 9 , wherein: the de-duplication manager is configured to associate names of files admitted into the de-duplication cache with respective data identifiers derived from data of the files in a de-duplication index maintained within the virtual machine; and the de-duplication manager is further configured to request data of files admitted into the de-duplication cache by use of the data identifiers associated with the files in the de-duplication index. 13. The apparatus of claim 12 , wherein the de-duplication manager is configured to remove an association between a particular file and a data identifier from the de-duplication index in response to detecting an I/O request to modify the particular file. 14. The apparatus of claim 12 , wherein the de-duplication manager is configured to write the de-duplication index to persistent storage and to load the de-duplication index into memory of the virtual machine from the persistent storage in response to one or more of restarting the virtual machine, rebooting the virtual machine, power cycling the virtual machine, and migrating the virtual machine to a different host. 15. The apparatus of claim 9 , wherein the de-duplication manager identifies the I/O requests pertaining to files that qualify for admission into the de-duplication cache by use of file selection criteria, the file selection criteria based on one or more of a file name, a file extension, a file path, a file volume, a file attribute, and a hint. 16. A method, comprising: maintaining de-duplication metadata within a virtual machine operating on a host computing device, the de-duplication metadata to associate files of the virtual machine with respective data identifiers, the data identifiers derived from file data admitted into a de-duplication cache by the virtual machine; and servicing a request to read a particular file of the virtual machine by use of the de-duplication cache, wherein servicing the read request at the virtual machine comprises: using the de-duplication metadata maintained within the virtual machine to determine a data identifier associated with the particular file, the determined data identifier derived from file data admitted into the de-duplication cache by the virtual machine, and requesting the file data from the de-duplication cache by use of the determined data identifier. 17. The method of claim 16 , further comprising: determining that a file identifier corresponding to a specified file of the virtual machine is not associated with a data identifier by the de-duplication metadata maintained within the virtual machine; receiving file data corresponding to the specified file by use of a storage stack of the virtual machine; calculating a data identifier from the received file data; instructing the de-duplication cache to admit the received file data; and recording an association between the file identifier corresponding to the specified file and the calculated data identifier in the de-duplication metadata maintained within the virtual machine. 18. The method of claim 17 , wherein the received file data is admitted into a first de-duplication cache operating on a first host computing device, the method further comprising: retaining the association between the file identifier of the specified file and the calculated data identifier in the de-duplication metadata maintained within the virtual machine in response to the virtual machine migrating from the first host computing device to operate on a second host computing device. 19. The method of claim 18 , f

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9824018B2 cover?
A de-duplication is configured to cache data for access by a plurality of different storage clients, such as virtual machines. A virtual machine may comprise a virtual machine de-duplication module configured to identify data for admission into the de-duplication cache. Data admitted into the de-duplication cache may be accessible by two or more storage clients. Metadata pertaining to the conte…
Who is the assignee on this patent?
Sandisk Technologies Llc
What technology area does this patent fall under?
Primary CPC classification G06F12/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).