De-duplication with partitioning advice and automation
US-9213715-B2 · Dec 15, 2015 · US
US2017147598A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017147598-A1 |
| Application number | US-201715405953-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 13, 2017 |
| Priority date | Sep 11, 2014 |
| Publication date | May 25, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to one embodiment, a file system includes a hash value calculator, an access controller and a deduplication controller. The hash value calculator calculates a hash value of at least one data block in a file to be stored in storage. The access controller stores, when the at least one data block includes a first data block and when a first hash value of the first data block is calculated, the first data block in a first location of the storage determined based on the first hash value, by using the first hash value as an identifier. The deduplication controller prevents the first data block from being stored in the first location when an effective second data block is already stored in the first location.
Opening claim text (preview).
What is claimed is: 1 . A file system comprising: a hash value calculator configured to calculate a hash value of at least one data block included in a file to be stored in storage; an access controller configured to store, when the at least one data block includes a first data block and when a first hash value of the first data block is calculated, the first data block in a first location of the storage determined based on the first hash value, by using the first hash value as an identifier; and a deduplication controller configured to prevent the first data block from being stored in the first location when an effective second data block is already stored in the first location. 2 . The file system of claim 1 , wherein: the access controller is configured to divide the file into a plurality of data blocks including the first data block; the hash value calculator is configured to calculate a hash value of each of the data blocks; the deduplication controller is configured to determine whether deduplication of each of the data blocks is needed based on whether an effective data block is stored in a location of the storage determined based on the calculated hash value of each of the data blocks; and the access controller is configured to store the first data block in the first location when the deduplication of the first data block is not needed as a result of determination. 3 . The file system of claim 2 , further comprising a file manager configured to manage the location of the storage in which each of the data blocks of the file is stored, by using an inode associated with the file, wherein the access controller is configured to read the data blocks from the storage by specifying the location of the storage in which each of the data blocks of the file is stored based on the inode associated with the file, when the file needs to be read. 4 . The file system of claim 3 , wherein the inode associated with the file includes a block table in which the calculated hash value of each of the data blocks is recorded. 5 . The file system of claim 3 , wherein: the hash value calculator is configured to calculate the hash value of the read first data block, when the first data block is read from the first location; the access controller is configured to store a combination of metadata of the first data block including the first hash value and the first data block in the first location, when the deduplication of the first data block is not needed as a result of determination, and detect an error in reading the first data block by comparing the calculated hash value with the first hash value included in the metadata of the first data block, when the first data block is read from the first location and when the hash value of the read first data block is calculated. 6 . The file system of claim 2 , wherein: the access controller is configured to store, in the first location, a combination of metadata of the second data block and the second data block, when the second data block needs to be stored in the first location, the metadata including a duplication count used to indicate the number of data blocks having the same hash value as the hash value of the second data block; and the deduplication controller is configured to add 1 to the duplication count included in the metadata of the second data block stored in the first location, when the deduplication of the first data block is needed as a result of determination. 7 . The file system of claim 6 , further comprising a replication controller configured to create a copy of each data block stored in the storage, wherein the replication controller is configured to determine the number of copies of the second data block based on the duplication count included in the metadata of the second data block. 8 . The file system of claim 2 , wherein the access controller is configured to store, in the first location, a combination of metadata of the first data block and the first data block, the metadata including a duplication count used to indicate the number of data blocks having the same hash value as the hash value of the first data block, when the deduplication of the first data block is not needed as a result of determination. 9 . The file system of claim 1 , wherein: the storage comprises object storage; the first location of the object storage is determined based on an object identifier of a first object including a combination of metadata of the first data block and the first data block; and the access controller is configured to store the first object in the first location of the object storage determined based on the object identifier of the first object, by using the first hash value of the first data block as the object identifier of the first object. 10 . The file system of claim 1 , wherein: the storage comprises block storage; a first address specifying the first location of the block storage is indicated by using a predetermined portion of the first hash value of the first data block; and the access controller is configured to store a combination of metadata of the first data block and the first data block in the first location of the block storage specified by the first address. 11 . A data deduplication method applied to a file system, the method comprising: calculating a hash value of at least one data block included in a file to be stored in storage; storing, when the at least one data block includes a first data block and when a first hash value of the first data block is calculated, the first data block in a first location of the storage determined based on the first hash value, by using the first hash value as an identifier; and preventing the first data block from being stored in the first location when an effective second data block is already stored in the first location. 12 . A non-transitory computer-readable storage medium having stored thereon a computer program which is executable by a computer, the computer program controlling the computer to execute functions of: calculating a hash value of at least one data block included in a file to be stored in storage; storing, when the at least one data block includes a first data block and when a first hash value of the first data block is calculated, the first data block in a first location of the storage determined based on the hash value, by using the first hash value as an identifier; and preventing the first data block from being stored in the first location when an effective second data block is already stored in the first location.
at area level, e.g. provisioning of virtual or logical volumes · CPC title
Physics · mapped topic
Physics · mapped topic
De-duplication techniques · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.