Inline Wire Speed Deduplication System
US-2016306853-A1 · Oct 20, 2016 · US
US9870176B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9870176-B2 |
| Application number | US-201415033265-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2014 |
| Priority date | Nov 8, 2013 |
| Publication date | Jan 16, 2018 |
| Grant date | Jan 16, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Ingest data for virtual volumes (V) is split into segments (B 1 , B 2 , B 3 , B 4 ) of a size that can be buffered in main memory. Data deduplication processing then occurs directly on the segments (B 1 , B 2 , B 3 , B 4 ) in main memory, without the need for disk I/O.
Opening claim text (preview).
The invention claimed is: 1. A storage appliance comprising: an interface that provides access to at least one virtual tape drive for at least one host computer; a first database that stores metadata about virtual tape volumes received via the interface, at least one volatile memory; and a deduplication engine that deduplicates storage objects; wherein each virtual tape volume is represented within the storage appliance as an ordered set of segments of data and the storage appliance is configured to perform the following steps: defining a maximum segment size based on an amount of main memory available for a computer system performing segmentation; providing, from a volatile memory, at least one buffer memory of the defined maximum segment size to store a segment of data; receiving a data stream representing a virtual tape volume of the at least one virtual tape drive at the interface; filling the at least one buffer memory with data from the received data stream until the received data stream is closed, a synchronization point of the virtual tape volume is identified in the received data stream or an amount of data corresponding to the defined maximum segment size has been stored in the at least one buffer memory such that the at least one buffer memory is completely filled; deduplicating with the deduplication engine the segment of data stored in the at least one buffer memory; and storing the deduplicated segment of data in a non-volatile storage device. 2. The storage appliance according to claim 1 , wherein the metadata stored in the first database comprises mappings of segments of each virtual tape volume to tags of a binary large object, BLOB, corresponding to the deduplicated segment of data. 3. The storage appliance according to claim 1 , wherein the virtual tape volume of the at least one virtual tape drive has a capacity exceeding the defined maximum segment size. 4. The storage appliance according to claim 3 , wherein the virtual tape volume of the at least one virtual tape drive has a capacity of at least 200 GB and the defined maximum segment size is smaller than 200 GB. 5. The storage appliance according to claim 1 , wherein the storage appliance is configured to process a plurality of virtual tape volumes of the at least one virtual tape received in parallel, and, for each one of the plurality of virtual tape volumes received in parallel, a separate buffer memory that stores a segment of data of the respective virtual tape volume is provided. 6. The storage appliance according to claim 1 , wherein the storage appliance is configured to provide, from the volatile memory, a plurality of buffer memories, each buffer memory storing a segment of data and, wherein, while deduplication and/or storing of a segment of data received from the data stream and stored in a first buffer memory is performed, at least one second buffer memory is filled with data of a subsequent segment from the received data stream. 7. The storage appliance according to claim 6 , wherein a received file mark, which is used by tape applications to enforce the writing of all preceding data to a tape, is only acknowledged to an application after all segments of data preceding the file mark have been successfully processed and stored by the deduplication engine. 8. The storage appliance according to claim 1 , wherein, if a first segment of data is requested via the interface, the first segment of data is read and re-duplicated by the deduplication engine and provided to the interface and, before a subsequent request for a subsequent second segment of data is received, the second segment of data is read and re-duplicated by the deduplication engine and stored in the at least one memory buffer. 9. The storage appliance according to claim 8 , wherein, on receipt of the subsequent request for the second segment of data, the second segment of data is provided from the at least one buffer memory to the interface. 10. The storage appliance according to claim 1 , further comprising at least one Integrated Channel Processor that provides access to the at least one virtual tape drive for the at least one host computer, wherein the Integrated Channel Processor and the deduplication engine exchange data through at least one shared memory buffer. 11. The storage appliance according to claim 10 , wherein, on mounting a virtual tape volume, the storage appliance determines, based on configuration information, whether the virtual tape volume comprises deduplicated segments of data, and, if the virtual tape volume comprises deduplicated segments of data, the deduplication engine is assigned to the Integrated Channel Processor to handle input/output requests. 12. The storage appliance according to claim 1 , further comprising at least one Integrated Channel Processor that runs a de-duplication client and at least one Integrated Device Processor that runs a deduplication server. 13. A storage appliance comprising: an interface that provides access to at least one virtual tape drive for at least one host computer; a first database that stores metadata about virtual tape volumes received via the interface, at least one volatile memory; and a deduplication engine that deduplicates storage objects; wherein each virtual tape volume is represented within the storage appliance as an ordered set of segments of data and the storage appliance is configured to perform the following steps: providing, from a volatile memory, at least one buffer memory to store a segment of data; receiving a data stream representing a virtual tape volume of the at least one virtual tape drive at the interface; filling the at least one buffer memory with data from the received data stream until the received data stream is closed, and a synchronization point of the virtual tape volume is identified in the received data stream or a predefined amount of data has been stored in the buffer memory; deduplicating with the deduplication engine the segment of data stored in the at least one buffer memory; and storing the deduplicated segment of data in a non-volatile storage device; wherein data of an existing virtual tape volume is modified by: identifying an index of a first segment of data to be modified; reading and re-duplicating the deduplicated first segment from the deduplication engine into a memory buffer; invalidating all segments of data having an index equal to or greater than the identified index and deleting the corresponding indices from the ordered set; creating a new segment of data based on the buffered first segment and a modification request received from the interface; deduplicating and storing the new segment; and adding the index of the new segment to the end of the ordered set. 14. The storage appliance according to claim 13 , wherein data received from the interface is appended to an existing virtual tape volume by: creating additional segments for the data to be appended; deduplicating and storing the additional segments, and adding indices of the additional deduplicated segments to the end of the ordered set. 15. The storage appliance according to claim 13 , wherein data of the existing virtual tape volume is read from a predetermined position of the virtual tape volume by: identifying an index of a first segment of data to be read; reading and re-duplicating the first segment by the deduplication engine into a memory buffer without previously re-duplicating any segment having a lower index than the first segment; and providing the data from the predetermined position from the at least one buffer memory via the interface.
Libraries, e.g. tape libraries, jukebox · CPC title
Organizing or formatting or addressing of data · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Data buffering arrangements · CPC title
Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.