File system, data deduplication method and storage medium

US2017147598A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017147598-A1
Application numberUS-201715405953-A
CountryUS
Kind codeA1
Filing dateJan 13, 2017
Priority dateSep 11, 2014
Publication dateMay 25, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment, a file system includes a hash value calculator, an access controller and a deduplication controller. The hash value calculator calculates a hash value of at least one data block in a file to be stored in storage. The access controller stores, when the at least one data block includes a first data block and when a first hash value of the first data block is calculated, the first data block in a first location of the storage determined based on the first hash value, by using the first hash value as an identifier. The deduplication controller prevents the first data block from being stored in the first location when an effective second data block is already stored in the first location.

First claim

Opening claim text (preview).

What is claimed is: 1 . A file system comprising: a hash value calculator configured to calculate a hash value of at least one data block included in a file to be stored in storage; an access controller configured to store, when the at least one data block includes a first data block and when a first hash value of the first data block is calculated, the first data block in a first location of the storage determined based on the first hash value, by using the first hash value as an identifier; and a deduplication controller configured to prevent the first data block from being stored in the first location when an effective second data block is already stored in the first location. 2 . The file system of claim 1 , wherein: the access controller is configured to divide the file into a plurality of data blocks including the first data block; the hash value calculator is configured to calculate a hash value of each of the data blocks; the deduplication controller is configured to determine whether deduplication of each of the data blocks is needed based on whether an effective data block is stored in a location of the storage determined based on the calculated hash value of each of the data blocks; and the access controller is configured to store the first data block in the first location when the deduplication of the first data block is not needed as a result of determination. 3 . The file system of claim 2 , further comprising a file manager configured to manage the location of the storage in which each of the data blocks of the file is stored, by using an inode associated with the file, wherein the access controller is configured to read the data blocks from the storage by specifying the location of the storage in which each of the data blocks of the file is stored based on the inode associated with the file, when the file needs to be read. 4 . The file system of claim 3 , wherein the inode associated with the file includes a block table in which the calculated hash value of each of the data blocks is recorded. 5 . The file system of claim 3 , wherein: the hash value calculator is configured to calculate the hash value of the read first data block, when the first data block is read from the first location; the access controller is configured to store a combination of metadata of the first data block including the first hash value and the first data block in the first location, when the deduplication of the first data block is not needed as a result of determination, and detect an error in reading the first data block by comparing the calculated hash value with the first hash value included in the metadata of the first data block, when the first data block is read from the first location and when the hash value of the read first data block is calculated. 6 . The file system of claim 2 , wherein: the access controller is configured to store, in the first location, a combination of metadata of the second data block and the second data block, when the second data block needs to be stored in the first location, the metadata including a duplication count used to indicate the number of data blocks having the same hash value as the hash value of the second data block; and the deduplication controller is configured to add 1 to the duplication count included in the metadata of the second data block stored in the first location, when the deduplication of the first data block is needed as a result of determination. 7 . The file system of claim 6 , further comprising a replication controller configured to create a copy of each data block stored in the storage, wherein the replication controller is configured to determine the number of copies of the second data block based on the duplication count included in the metadata of the second data block. 8 . The file system of claim 2 , wherein the access controller is configured to store, in the first location, a combination of metadata of the first data block and the first data block, the metadata including a duplication count used to indicate the number of data blocks having the same hash value as the hash value of the first data block, when the deduplication of the first data block is not needed as a result of determination. 9 . The file system of claim 1 , wherein: the storage comprises object storage; the first location of the object storage is determined based on an object identifier of a first object including a combination of metadata of the first data block and the first data block; and the access controller is configured to store the first object in the first location of the object storage determined based on the object identifier of the first object, by using the first hash value of the first data block as the object identifier of the first object. 10 . The file system of claim 1 , wherein: the storage comprises block storage; a first address specifying the first location of the block storage is indicated by using a predetermined portion of the first hash value of the first data block; and the access controller is configured to store a combination of metadata of the first data block and the first data block in the first location of the block storage specified by the first address. 11 . A data deduplication method applied to a file system, the method comprising: calculating a hash value of at least one data block included in a file to be stored in storage; storing, when the at least one data block includes a first data block and when a first hash value of the first data block is calculated, the first data block in a first location of the storage determined based on the first hash value, by using the first hash value as an identifier; and preventing the first data block from being stored in the first location when an effective second data block is already stored in the first location. 12 . A non-transitory computer-readable storage medium having stored thereon a computer program which is executable by a computer, the computer program controlling the computer to execute functions of: calculating a hash value of at least one data block included in a file to be stored in storage; storing, when the at least one data block includes a first data block and when a first hash value of the first data block is calculated, the first data block in a first location of the storage determined based on the hash value, by using the first hash value as an identifier; and preventing the first data block from being stored in the first location when an effective second data block is already stored in the first location.

Assignees

Inventors

Classifications

  • at area level, e.g. provisioning of virtual or logical volumes · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • De-duplication techniques · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017147598A1 cover?
According to one embodiment, a file system includes a hash value calculator, an access controller and a deduplication controller. The hash value calculator calculates a hash value of at least one data block in a file to be stored in storage. The access controller stores, when the at least one data block includes a first data block and when a first hash value of the first data block is calculate…
Who is the assignee on this patent?
Toshiba Kk, Toshiba Solutions Corp
What technology area does this patent fall under?
Primary CPC classification G06F17/30156. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).