Optimizing data block size for deduplication
US-9626373-B2 · Apr 18, 2017 · US
US9864658B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9864658-B1 |
| Application number | US-201414557045-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 1, 2014 |
| Priority date | Dec 1, 2014 |
| Publication date | Jan 9, 2018 |
| Grant date | Jan 9, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for automation of deduplication storage capacity sizing and trending analysis is provided. The method includes collecting all file system directories of at least one system for which a deduplication backup storage capacity for files in the all file system directories is to be determined. The method includes determining file counts, file sizes and file types of the files in the all file system directories and obtaining a deduplication ratio of each of the file types. The method includes deriving the deduplication backup storage capacity from the file counts, the file sizes and the file types of the files in the all file system directories, based on the deduplication ratio of each of the file types.
Opening claim text (preview).
What is claimed is: 1. A method for automated, deduplicated backup of a plurality of files of at least one system, comprising: collecting all file system directories of the at least one system for which a deduplication backup storage capacity for files identified in the all file system directories is to be determined; determining file counts, file sizes and file types of the files identified in the all file system directories; obtaining a deduplication ratio of each of the file types; deriving the deduplication backup storage capacity from the file counts, the file sizes and the file types of the files identified in the all file system directories, based on the deduplication ratio of each of the file types; and backing up the files identified in the all file system directories based on the derived deduplication backup storage capacity, wherein at least one action of the method is performed by a processor. 2. The method of claim 1 , wherein obtaining the deduplication ratio of each of the file types comprises looking up the deduplication ratio for each file type, wherein the deduplication ratio indicates a relative amount of data before deduplication and after deduplication. 3. The method of claim 1 , wherein obtaining the deduplication ratio of each of the file types comprises, for each file type of the file types: selecting from the all file system directories a plurality of files having each file type; executing a deduplication algorithm on the plurality of files having each file type; and determining the deduplication ratio of the plurality of files having each file type, which serves as the deduplication ratio of each file type, based on a result of the executing the deduplication algorithm. 4. The method of claim 1 , wherein deriving the deduplication backup storage capacity comprises: deriving a deduplication backup storage capacity for each of the file types, based on file counts and file sizes of each of the file types and the deduplication ratio of each of the file types; summing, across all of the file types, the deduplication backup storage capacity for each of the file types; and adding a margin to a result of the summing, or multiplying a result of the summing by a margin ratio. 5. The method of claim 1 , further comprising: tracking a rate of increase of file counts of files having each of the file types; and calculating a capacity utilization projection, based on an installed deduplication backup storage capacity and the derived deduplication backup storage capacity as projected with the tracked rate of increase of file counts of files having each of the file types. 6. The method of claim 1 , further comprising: obtaining at least one parameter, including network bandwidth or throughput, applicable to data recovery from a deduplicated backup; and calculating a disaster recovery time for a full system restore, based on the file counts, the file sizes and the file types of the files in the all file system directories, the deduplication ratio of each of the file types, and the at least one parameter. 7. The method of claim 1 , wherein the file types and the deduplication ratio of each of the file types satisfy at least one of: text files use less deduplication backup storage than video, audio or image files, relative to file size; application logs use less deduplication backup storage than database files, relative to file size; or application data files are separated by file types according to application, with differing deduplication ratios. 8. A tangible, non-transitory, computer-readable media having instructions thereupon which, when executed by a processor, cause the processor to perform a method for automated, deduplicated backup of a plurality of electronic files of at least one system, the method comprising: collecting all file system directories of at least one system for which a deduplication backup storage capacity for files identified in the all file system directories is to be determined; determining file counts, file sizes and file types of the files identified in the all file system directories; obtaining a deduplication ratio of each of the file types; deriving the deduplication backup storage capacity from the file counts, the file sizes and the file types of the files identified in the all file system directories, based on the deduplication ratio of each of the file types; and backing up the files identified in the all file system directories based on the derived deduplication backup storage capacity. 9. The computer-readable media of claim 8 , wherein obtaining the deduplication ratio of each of the file types comprises looking up the deduplication ratio for each file type, wherein the deduplication ratio indicates a relative amount of data before and after deduplication. 10. The computer-readable media of claim 8 , wherein obtaining the deduplication ratio of each of the file types comprises, for each file type of the file types: selecting from the all file system directories a plurality of files having each file type; executing a deduplication algorithm on the plurality of files having each file type; and determining the deduplication ratio of the plurality of files having each file type, which serves as the deduplication ratio of each file type, based on a result of the executing the deduplication algorithm. 11. The computer-readable media of claim 8 , wherein deriving the deduplication backup storage capacity comprises: deriving a deduplication backup storage capacity for each of the file types, based on file counts and file sizes of each of the file types and the deduplication ratio of each of the file types; summing, across all of the file types, the deduplication backup storage capacity for each of the file types; and adding a margin to a result of the summing, or multiplying a result of the summing by a margin ratio. 12. The computer-readable media of claim 8 , further comprising: tracking a rate of increase of file counts of files having each of the file types; and calculating a capacity utilization projection, based on an installed deduplication backup storage capacity and the derived deduplication backup storage capacity as projected with the tracked rate of increase of file counts of files having each of the file types. 13. The computer-readable media of claim 8 , further comprising: obtaining at least one parameter, including network bandwidth or throughput, applicable to data recovery from a deduplicated backup; and calculating a disaster recovery time for a full system restore, based on the file counts, the file sizes and the file types of the files in the all file system directories, the deduplication ratio of each of the file types, and the at least one parameter. 14. A system for automated, deduplicated backup of a plurality of files of at least one system, comprising: a processor, configured to couple to a file system; and the processor configured to perform actions, including: collecting all file system directories of at least one system for which a deduplication backup storage capacity for files identified in the all file system directories is to be determined; determining file counts, file sizes and file types of the files identified in the all file system directories; obtaining a deduplication ratio of each of the file types; and deriving the deduplication backup storage capacity from the file counts, the file sizes and the file types of the files identified in the all file system directories, based on the deduplication ratio of each of the file types; and backing up the files identified in the all file system directories based
using de-duplication of the data · CPC title
Physics · mapped topic
File system administration, e.g. details of archiving or snapshots (error detection or correction of the data by redundancy in operations G06F11/14) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.