Removal of reference information for storage blocks in a deduplication system
US-2016371295-A1 · Dec 22, 2016 · US
US10956382B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10956382-B2 |
| Application number | US-201615082251-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 28, 2016 |
| Priority date | Mar 28, 2016 |
| Publication date | Mar 23, 2021 |
| Grant date | Mar 23, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments for managing data in a data deduplication repository in a computing storage environment, by a processor device, are provided. In one embodiment, a method comprises issuing an application programming interface (API) command to scan metadata of a subset of entities in a local deduplication repository for identifying candidate data to offload from the local deduplication repository to an object storage, offloading the candidate data to the object storage, and returning a status result using the API command.
Opening claim text (preview).
What is claimed is: 1. A method for managing data in a data deduplication repository in a computing storage environment, by a processor device, comprising: issuing an application programming interface (API) command, by an existing backup management application executing on a host to a cloud deduplicating gateway, to scan metadata by the cloud deduplicating gateway of a subset of entities in a local deduplication repository stored on one or more storage devices associated with the host for identifying candidate data to offload from the local deduplication repository to an object storage, offloading the candidate data to the object storage, and returning a status result using the API command to the existing backup management application; wherein the local deduplication repository is stored on-premise and the object storage is stored off-premise such that the offloading is performed to lower a consumption of on-premise storage by migrating the candidate data identified during the scan of the metadata of the subset of entities to the object storage being stored off-premise; and according to an output of the scanning, identifying the candidate data as repository data developed on a candidate list which includes repository data having a reference count number below a predetermined reference count threshold, the reference count number associated with a deduplication ratio; wherein the scanning comprises iterating through each of the subset of entities in the local deduplication repository, identifying whether the given reference count number of each of the subset of entities is below the predetermined reference count threshold, sorting the identified candidate data onto the candidate list according to the deduplication ratio and age information, and returning the sorted candidate list in an API response to the API command; and wherein repository data explicitly marked as excluded by a user is excluded from the candidate list notwithstanding whether the repository data explicitly marked as excluded has the reference count number below the predetermined reference count threshold. 2. The method of claim 1 , further including excluding repository data from the candidate list based on a predetermined age threshold associated with an age of the repository data. 3. The method of claim 1 , further including, when offloading the candidate data using a virtual tape library (VTL) system interface, performing: moving a cartridge containing identified candidate data from a VTL drive slot to an import/export (I/E) slot; migrating the identified candidate data from the cartridge to the object storage; removing the cartridge from the I/E slot; and communicating the status result of the migration using the API command. 4. The method of claim 1 , further including, when offloading the candidate data using a file system interface, exporting a mount point that presents content of the object storage using at least one of a common internet file system (CIFS), a server message block (SMB), and a network file system (NFS) protocol. 5. The method of claim 1 , further including offloading the candidate data when a repository capacity is greater than a predetermined repository capacity threshold. 6. The method of claim 1 , further including maintaining a mapping of the offloaded candidate data between the local deduplication repository and the object storage by updating the local deduplication repository metadata. 7. A system for managing data in a data deduplication repository in a computing storage environment, the system comprising: at least one processor device, wherein the at least one processor device: issues an application programming interface (API) command, by an existing backup management application executing on a host to a cloud deduplicating gateway, to scan metadata by the cloud deduplicating gateway of a subset of entities in a local deduplication repository stored on one or more storage devices associated with the host for identifying candidate data to offload from the local deduplication repository to an object storage, offloading the candidate data to the object storage, and returning a status result using the API command to the existing backup management application; wherein the local deduplication repository is stored on-premise and the object storage is stored off-premise such that the offloading is performed to lower a consumption of on-premise storage by migrating the candidate data identified during the scan of the metadata of the subset of entities to the object storage being stored off-premise; and according to an output of the scanning, identifies the candidate data as repository data developed on a candidate list which includes repository data having a reference count number below a predetermined reference count threshold, the reference count number associated with an overall system deduplication ratio; wherein the scanning comprises iterating through each of the subset of entities in the local deduplication repository, identifying whether the given reference count number of each of the subset of entities is below the predetermined reference count threshold, sorting the identified candidate data onto the candidate list according to the deduplication ratio and age information, and returning the sorted candidate list in an API response to the API command; and wherein repository data explicitly marked as excluded by a user is excluded from the candidate list notwithstanding whether the repository data explicitly marked as excluded has the reference count number below the predetermined reference count threshold. 8. The system of claim 7 , wherein the at least one processor device excludes repository data from the candidate list based on a predetermined age threshold associated with an age of the repository data. 9. The system of claim 7 , wherein the at least one processor device, when offloading the candidate data using a virtual tape library (VTL) system interface, performs: moving a cartridge containing identified candidate data from a VTL drive slot to an import/export (I/E) slot; migrating the identified candidate data from the cartridge to the object storage; removing the cartridge from the I/E slot; and communicating the status result of the migration using the API command. 10. The system of claim 7 , wherein the processor device, when offloading the candidate data using a file system interface, exports a mount point that presents content of the object storage using at least one of a common internet file system (CIFS), a server message block (SMB), and a network file system (NFS) protocol. 11. The system of claim 7 , wherein the at least one processor device offloads the candidate data when a repository capacity is greater than a predetermined repository capacity threshold. 12. The system of claim 7 , wherein the at least one processor device maintains a mapping of the offloaded candidate data between the local deduplication repository and the object storage by updating the local deduplication repository metadata. 13. A computer program product for managing data in a data deduplication repository in a computing storage environment, by a processor device, the computer program product embodied on a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that issues an application programming interface (API) command, by an existing backup management application executing on a host to a cloud deduplicating gateway, to scan metadata by the cloud deduplicating gateway of a subset of entities in a local deduplication repository s
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
using de-duplication of the data · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.