Storage appliance and method of segment deduplication

US9870176B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9870176-B2
Application numberUS-201415033265-A
CountryUS
Kind codeB2
Filing dateJun 30, 2014
Priority dateNov 8, 2013
Publication dateJan 16, 2018
Grant dateJan 16, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Ingest data for virtual volumes (V) is split into segments (B 1 , B 2 , B 3 , B 4 ) of a size that can be buffered in main memory. Data deduplication processing then occurs directly on the segments (B 1 , B 2 , B 3 , B 4 ) in main memory, without the need for disk I/O.

First claim

Opening claim text (preview).

The invention claimed is: 1. A storage appliance comprising: an interface that provides access to at least one virtual tape drive for at least one host computer; a first database that stores metadata about virtual tape volumes received via the interface, at least one volatile memory; and a deduplication engine that deduplicates storage objects; wherein each virtual tape volume is represented within the storage appliance as an ordered set of segments of data and the storage appliance is configured to perform the following steps: defining a maximum segment size based on an amount of main memory available for a computer system performing segmentation; providing, from a volatile memory, at least one buffer memory of the defined maximum segment size to store a segment of data; receiving a data stream representing a virtual tape volume of the at least one virtual tape drive at the interface; filling the at least one buffer memory with data from the received data stream until the received data stream is closed, a synchronization point of the virtual tape volume is identified in the received data stream or an amount of data corresponding to the defined maximum segment size has been stored in the at least one buffer memory such that the at least one buffer memory is completely filled; deduplicating with the deduplication engine the segment of data stored in the at least one buffer memory; and storing the deduplicated segment of data in a non-volatile storage device. 2. The storage appliance according to claim 1 , wherein the metadata stored in the first database comprises mappings of segments of each virtual tape volume to tags of a binary large object, BLOB, corresponding to the deduplicated segment of data. 3. The storage appliance according to claim 1 , wherein the virtual tape volume of the at least one virtual tape drive has a capacity exceeding the defined maximum segment size. 4. The storage appliance according to claim 3 , wherein the virtual tape volume of the at least one virtual tape drive has a capacity of at least 200 GB and the defined maximum segment size is smaller than 200 GB. 5. The storage appliance according to claim 1 , wherein the storage appliance is configured to process a plurality of virtual tape volumes of the at least one virtual tape received in parallel, and, for each one of the plurality of virtual tape volumes received in parallel, a separate buffer memory that stores a segment of data of the respective virtual tape volume is provided. 6. The storage appliance according to claim 1 , wherein the storage appliance is configured to provide, from the volatile memory, a plurality of buffer memories, each buffer memory storing a segment of data and, wherein, while deduplication and/or storing of a segment of data received from the data stream and stored in a first buffer memory is performed, at least one second buffer memory is filled with data of a subsequent segment from the received data stream. 7. The storage appliance according to claim 6 , wherein a received file mark, which is used by tape applications to enforce the writing of all preceding data to a tape, is only acknowledged to an application after all segments of data preceding the file mark have been successfully processed and stored by the deduplication engine. 8. The storage appliance according to claim 1 , wherein, if a first segment of data is requested via the interface, the first segment of data is read and re-duplicated by the deduplication engine and provided to the interface and, before a subsequent request for a subsequent second segment of data is received, the second segment of data is read and re-duplicated by the deduplication engine and stored in the at least one memory buffer. 9. The storage appliance according to claim 8 , wherein, on receipt of the subsequent request for the second segment of data, the second segment of data is provided from the at least one buffer memory to the interface. 10. The storage appliance according to claim 1 , further comprising at least one Integrated Channel Processor that provides access to the at least one virtual tape drive for the at least one host computer, wherein the Integrated Channel Processor and the deduplication engine exchange data through at least one shared memory buffer. 11. The storage appliance according to claim 10 , wherein, on mounting a virtual tape volume, the storage appliance determines, based on configuration information, whether the virtual tape volume comprises deduplicated segments of data, and, if the virtual tape volume comprises deduplicated segments of data, the deduplication engine is assigned to the Integrated Channel Processor to handle input/output requests. 12. The storage appliance according to claim 1 , further comprising at least one Integrated Channel Processor that runs a de-duplication client and at least one Integrated Device Processor that runs a deduplication server. 13. A storage appliance comprising: an interface that provides access to at least one virtual tape drive for at least one host computer; a first database that stores metadata about virtual tape volumes received via the interface, at least one volatile memory; and a deduplication engine that deduplicates storage objects; wherein each virtual tape volume is represented within the storage appliance as an ordered set of segments of data and the storage appliance is configured to perform the following steps: providing, from a volatile memory, at least one buffer memory to store a segment of data; receiving a data stream representing a virtual tape volume of the at least one virtual tape drive at the interface; filling the at least one buffer memory with data from the received data stream until the received data stream is closed, and a synchronization point of the virtual tape volume is identified in the received data stream or a predefined amount of data has been stored in the buffer memory; deduplicating with the deduplication engine the segment of data stored in the at least one buffer memory; and storing the deduplicated segment of data in a non-volatile storage device; wherein data of an existing virtual tape volume is modified by: identifying an index of a first segment of data to be modified; reading and re-duplicating the deduplicated first segment from the deduplication engine into a memory buffer; invalidating all segments of data having an index equal to or greater than the identified index and deleting the corresponding indices from the ordered set; creating a new segment of data based on the buffered first segment and a modification request received from the interface; deduplicating and storing the new segment; and adding the index of the new segment to the end of the ordered set. 14. The storage appliance according to claim 13 , wherein data received from the interface is appended to an existing virtual tape volume by: creating additional segments for the data to be appended; deduplicating and storing the additional segments, and adding indices of the additional deduplicated segments to the end of the ordered set. 15. The storage appliance according to claim 13 , wherein data of the existing virtual tape volume is read from a predetermined position of the virtual tape volume by: identifying an index of a first segment of data to be read; reading and re-duplicating the first segment by the deduplication engine into a memory buffer without previously re-duplicating any segment having a lower index than the first segment; and providing the data from the predetermined position from the at least one buffer memory via the interface.

Assignees

Inventors

Classifications

  • Libraries, e.g. tape libraries, jukebox · CPC title

  • Organizing or formatting or addressing of data · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Data buffering arrangements · CPC title

  • Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9870176B2 cover?
Ingest data for virtual volumes (V) is split into segments (B 1 , B 2 , B 3 , B 4 ) of a size that can be buffered in main memory. Data deduplication processing then occurs directly on the segments (B 1 , B 2 , B 3 , B 4 ) in main memory, without the need for disk I/O.
Who is the assignee on this patent?
Fujitsu Ltd
What technology area does this patent fall under?
Primary CPC classification G06F3/061. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 16 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).