Stored data deduplication method, stored data deduplication apparatus, and deduplication program

US9542413B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9542413-B2
Application numberUS-201114349561-A
CountryUS
Kind codeB2
Filing dateOct 6, 2011
Priority dateOct 6, 2011
Publication dateJan 10, 2017
Grant dateJan 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Method of dividing data to be stored in storage device into data fragments; recording the data by using configurations of divided data fragments; judging whether identical data fragments exist in data fragments; when it is judged that identical data fragments exist, storing one of the identical data fragments in storage area of the storage device, and generating and recording data-fragment attribute information indicating an attribute unique to the data fragment stored; upon receipt of request to read data stored in the storage area of the storage device, acquiring the configurations of the data fragments forming the read-target data, reading the corresponding data fragments from the storage area of the storage device, and restoring the data; acquiring and coupling the recorded data fragments to generate concatenation target data targeted for judgment on whether chunk concatenation is possible or not, and detecting whether the concatenation target data has a repeated data pattern.

First claim

Opening claim text (preview).

The invention claimed is: 1. A stored-data deduplication method for eliminating a duplicate data fragment from a storage area in a storage device, the duplicate data fragment being a duplicate of one of data fragments constituting data stored in the storage device, the method comprising: dividing the data to be stored in the storage device into the data fragments; recording the data by using configurations of the divided data fragments; judging whether identical data fragments exist in the data fragments; when it is judged that the identical data fragments exist, storing one of the identical data fragments in the storage area of the storage device, and generating and recording data-fragment attribute information which is information indicating an attribute unique to the data fragment stored; upon receipt of a request to read the data stored in the storage area of the storage device, acquiring the configurations of the data fragments forming the read-target data, reading the corresponding data fragments from the storage area of the storage device, and restoring the data; acquiring and coupling the recorded data fragments to generate concatenation target data targeted for judgment on whether chunk concatenation is possible or not, and detecting whether the concatenation target data has a repeated data pattern which is repetition of a particular data pattern; and using as a concatenated data fragment a sequence of a plurality of the data fragments having the detected repeated data pattern, generating from the concatenated data fragment concatenated-data fragment attribute information indicating an attribute of the concatenated data fragment, and recording the concatenated-data fragment attribute information, wherein the repeated data pattern is not recorded when the number of times the repeated data pattern is detected is less than a predetermined value, and wherein when the detected repeated data pattern contains a plurality of the identical data fragments, the repeated data pattern is divided to avoid having the identical data fragments. 2. The stored data deduplication method according to claim 1 , wherein the data-fragment attribute information contains a hash value calculated by applying a predetermined hash function to the corresponding data fragment and storage location information indicating a location in the storage area where the data fragment is stored, and whether there is a duplicate data fragment of any of the data fragments is judged by comparing the hash values of the data fragments. 3. The stored data deduplication method according to claim 2 , wherein the storage location information is acquired for each of the plurality of data fragments contained in the concatenated data fragment, and the data fragments contained in the concatenated data fragment are relocated according to the storage location information so that they are stored consecutively on the storage area of the storage device. 4. The stored data deduplication method according to claim 1 , wherein when a plurality of the detected repeated data patterns have the identical data fragment, the repeated data patterns other than one repeated data pattern selected according to a predetermined rule are not recorded. 5. The stored data deduplication method according to claim 1 , wherein in the acquisition and coupling of the data fragments, the repeated data pattern is not recognized across a break position of the data to be written into or read from the storage device. 6. The stored data deduplication method according to claim 1 , wherein in the detection of the repeated data pattern, a break position of the data fragments which is located at a position short of a length of the concatenated data fragment already recorded is not recognized in the detection of the repeated data pattern. 7. A stored-data deduplication apparatus for eliminating a duplicate data fragment from a storage area in a storage device, the duplicate data fragment being a duplicate of one of data fragments constituting data stored in the storage device, the apparatus comprising a processor, a memory, and units implemented when the processor executes a corresponding program on the memory, the units being: a data division unit configured to divide the data to be stored in the storage device into the data fragments; a data registration unit configured to record the data by using configurations of the divided data fragments; a data matching unit configured to judge whether identical data fragments exist in the data fragments, and when it is judged that the identical data fragments exist, store one of the identical data fragments in the storage area of the storage device, and generate and record data-fragment attribute information which is information indicating an attribute unique to the data fragment stored; a data restoration unit configured to, upon receipt of a request to read the data stored in the storage area of the storage device, acquire the configurations of the data fragments forming the read-target data, read the corresponding data fragments from the storage area of the storage device, and restore the data; a data analysis unit configured to acquire and couple the recorded data fragments to generate concatenation target data targeted for judgment on whether chunk concatenation is possible or not, and detect whether the concatenation target data has a repeated data pattern which is repetition of a particular data pattern; and a data update unit configured to use as a concatenated data fragment a sequence of a plurality of the data fragments having the detected repeated data pattern, generate from the concatenated data fragment concatenated-data fragment attribute information indicating an attribute of the concatenated data fragment, and record the concatenated-data fragment attribute information, wherein the repeated data pattern is not recorded when the number of times the repeated data pattern is detected is less than a predetermined value, and wherein when the detected repeated data pattern contains a plurality of the identical data fragments, the repeated data pattern is divided to avoid having the identical data fragments. 8. The stored data deduplication apparatus according to claim 7 , wherein the data-fragment attribute information contains a hash value calculated by applying a predetermined hash function to the corresponding data fragment and storage location information indicating a location in the storage area where the data fragment is stored, and whether there is a duplicate data fragment of any of the data fragments is judged by comparing the hash values of the data fragments. 9. The stored data deduplication apparatus according to claim 8 , wherein the storage location information is acquired for each of the plurality of data fragments contained in the concatenated data fragment, and the data fragments contained in the concatenated data fragment are relocated according to the storage location information so that they are stored consecutively on the storage area of the storage device. 10. The stored data deduplication apparatus according to claim 7 , wherein when a plurality of the detected repeated data patterns have the identical data fragment, the repeated data patterns other than one repeated data pattern selected according to a predetermined rule are not recorded. 11. The stored data deduplication apparatus according to claim 8 , wherein in the acquisition and coupling of the data fragments, the repeated data pattern is not recognized across a break position of the data to be written into or read from the storage device. 12. The stored data deduplication apparatus according to c

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • De-duplication techniques · CPC title

  • using de-duplication of the data · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9542413B2 cover?
Method of dividing data to be stored in storage device into data fragments; recording the data by using configurations of divided data fragments; judging whether identical data fragments exist in data fragments; when it is judged that identical data fragments exist, storing one of the identical data fragments in storage area of the storage device, and generating and recording data-fragment attr…
Who is the assignee on this patent?
Serita Susumu, Fujii Yasuhiro, Hitachi Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).