Managing deduplicated data

US10635639B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10635639-B2
Application numberUS-201715459706-A
CountryUS
Kind codeB2
Filing dateMar 15, 2017
Priority dateNov 30, 2016
Publication dateApr 28, 2020
Grant dateApr 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Facilitating deduplication of data in a computing system without managing access to reference count variables. A method embodiment commences upon detecting first data unit and calculating a first checksum value. At a later time, a second data unit is received and the system calculates a second checksum value. If the second checksum value is the same as the first checksum value, then the first data unit and the second data unit are the same data and need not be duplicated. In such cases, an entry in the metadata points to the location of the first data unit that is already stored. Additional metadata entries are made in the metadata to associate a Boolean usage state flag and a Boolean deletion state flag with the second checksum value. Periodically scans of the metadata are performed. When both Boolean flags are in a particular state, the deduplicated data is deleted.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: generating a map entry in mapping metadata to map a data unit to a physical storage location, wherein the data unit is a deduplicated data unit and corresponds to a deduplicated entry in deduplication metadata, the mapping metadata and the deduplication metadata are different data structures, and a non-deletion state is set in the deduplication metadata for the data unit when the map entry exists in the mapping metadata for the data unit; and performing a scan of the mapping metadata for the data unit, wherein an in-use state is set in the deduplication metadata for the data unit when the map entry is detected for the data unit in the mapping metadata, a not-in-use state and a deletion state are set for the data unit in the deduplication metadata when the map entry is not detected in the mapping metadata for the data unit, and the data unit is deleted from the physical storage location based at least in part on a result of the scan when the data unit is determined to correspond to the not-in-use state and the deletion state. 2. The method of claim 1 , wherein a deduplication of the data unit is managed without implementing a reference count value of the data unit. 3. The method of claim 1 , further comprising: enumerating a set of map entries corresponding to the data unit; and setting a usage state to the in-use state when the set of map entries is a non-zero set, or to the not-in-use state when the set of map entries is an empty set. 4. The method of claim 3 , further comprising collecting a set of analysis data from the set of map entries to perform a statistical analysis. 5. The method of claim 1 , further comprising deleting the deduplicated entry in the deduplication metadata when the data unit corresponds to the not-in-use state and the deletion state. 6. The method of claim 1 , wherein the data unit is stored in a physical storage facility. 7. The method of claim 1 , wherein the mapping metadata and the deduplication metadata are distributed over a plurality of nodes. 8. The method of claim 1 , wherein a checksum value is calculated by applying a SHA-1 hashing scheme to a set of content comprising the data unit. 9. The method of claim 1 , wherein the map entry and the deduplicated entry are stored in at least one of a persistent storage facility or an ephemeral storage facility. 10. A computer program, embodied in a non-transitory computer readable medium, the non-transitory computer readable medium having stored thereon a sequence of instructions which, when stored in memory and executed by one or more processors, causes the one or more processors to perform a set of acts, the set of acts comprising: generating a map entry in a mapping metadata to map a data unit to a physical storage location, wherein the data unit is a deduplicated data unit and corresponds to a deduplicated entry in deduplication metadata, the mapping metadata and the deduplication metadata are different data structures, and a non-deletion state is set in the deduplication metadata for the data unit when the map entry exists in the mapping metadata for the data unit; and performing a scan of the mapping metadata for the data unit, wherein an in-use state is set in the deduplication metadata for the data unit when the map entry is detected for the data unit in the mapping metadata, a not-in-use state and a deletion state are set for the data unit in the deduplication metadata when the map entry is not detected in the mapping metadata for the data unit, and the data unit is deleted from the physical storage location based at least in part on a result of the scan when the data unit is determined to correspond to the not-in-use state and the delete state. 11. The non-transitory computer readable medium of claim 10 , wherein a deduplication of the data unit is managed without implementing a reference count value of the data unit. 12. The non-transitory computer readable medium of claim 10 , further comprising instructions which, when stored in the memory and executed by the one or more processors, causes the one or more processors to perform further acts of: enumerating a set of map entries corresponding to the data unit; and setting a usage state to the in-use state when the set of map entries is a non-zero set, or to the not-in-use state when the set of map entries is an empty set. 13. The non-transitory computer readable medium of claim 12 , further comprising instructions which, when stored in the memory and executed by the one or more processors, causes the one or more processors to perform further acts of collecting a set of analysis data from the set of map entries to perform a statistical analysis. 14. The non-transitory computer readable medium of claim 10 , further comprising instructions which, when stored in the memory and executed by the one or more processors, causes the one or more processors to perform further acts of deleting the deduplicated entry in the deduplication metadata when the data unit corresponds to the not-in-use state and the delete state. 15. The non-transitory computer readable medium of claim 10 , wherein the data unit is stored in a physical storage facility. 16. The non-transitory computer readable medium of claim 10 , wherein the mapping metadata and the deduplication metadata are distributed over a plurality of nodes. 17. The non-transitory computer readable medium of claim 10 , wherein a checksum value is calculated by applying a SHA-1 hashing scheme to a set of content comprising the data unit. 18. The non-transitory computer readable medium of claim 10 , wherein the map entry and the deduplicated entry are stored in at least one of a persistent storage facility or an ephemeral storage facility. 19. A system comprising: a non-transitory storage medium having stored thereon a sequence of instructions; and one or more processors that execute the sequence of instructions to cause the one or more processors to perform a set of acts, the set of acts comprising; generating a map entry in a mapping metadata to map a data unit to a physical storage location, wherein the data unit is a deduplicated data unit and corresponds to a deduplicated entry in deduplication metadata, the mapping metadata and the deduplication metadata are different data structures, and a non-deletion state is set in the deduplication metadata for the data unit when the map entry exists in the mapping metadata for the data unit; and performing a scan of the mapping metadata for the data unit, wherein an in-use state is set in the deduplication metadata for the data unit when the map entry is detected for the data unit in the mapping metadata, a not-in-use state and a deletion state are set for the data unit in the deduplication metadata when the map entry is not detected in the mapping metadata for the data unit, and the data unit is deleted from the physical storage location based at least in part on a result of the scan when the data unit is determined to correspond to the not-in-use state and the deletion state. 20. The system of claim 19 , wherein a deduplication of the data unit is managed without implementing a reference count value of the data unit.

Assignees

Inventors

Classifications

  • Provision of network file services by network file servers, e.g. by using NFS, CIFS (network file access protocols H04L67/1097) · CPC title

  • File meta data generation · CPC title

  • Concurrency control, e.g. optimistic or pessimistic approaches · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10635639B2 cover?
Facilitating deduplication of data in a computing system without managing access to reference count variables. A method embodiment commences upon detecting first data unit and calculating a first checksum value. At a later time, a second data unit is received and the system calculates a second checksum value. If the second checksum value is the same as the first checksum value, then the first d…
Who is the assignee on this patent?
Nutanix Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).