Method and apparatus for random access of data stored in a sequential manner

US9740704B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9740704-B2
Application numberUS-69526110-A
CountryUS
Kind codeB2
Filing dateJan 28, 2010
Priority dateJan 28, 2010
Publication dateAug 22, 2017
Grant dateAug 22, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deduplication engine is operable to select at least two chunks of data for deduplication and deduplicate the selected at least two chunks of data. A first store is operable to store the deduplicated chunks of data in a sequential manner, and a second store is operable to store at least a portion of at least one chunk of the deduplicated data in a manner to allow random access, where data is accessed via the first and/or second store.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method performed by at least one processor, comprising: selecting chunks of data for deduplication; deduplicating the selected chunks of data to form deduplicated chunks, the deduplicating comprising removing a given chunk of the selected chunks of data and replacing the given chunk with a pointer to the given chunk, the deduplicated chunks comprising the pointer and remaining chunks of the selected chunks of data, the remaining chunks excluding the given chunk that has been removed; sequentially storing the deduplicated chunks including the pointer and the remaining chunks in a sequential format in a first store; maintaining a second store that contains a copy of a portion of at least one chunk of the remaining chunks, the second store storing data in a manner to allow random access; and in response to a read request for data at a location within the sequence of stored data: determining that the location is within the first store; determining whether at least a part of the location is within the second store in response to determining that the location is within the first store; accessing data of the second store in response to determining that the at least a part of the location is within the second store; and accessing data of the first store in response to determining that the at least a part of the location is not within the second store. 2. The method of claim 1 , wherein each chunk of the remaining chunks comprises a plurality of buckets. 3. The method of claim 2 , wherein the portion of the at least one chunk of the remaining chunks contained in the second store comprises at least one bucket of the remaining chunks. 4. The method of claim 1 , further comprising: receiving a write request for data at a location within the sequence of stored data; determining that the location of the write request is within the deduplicated chunks in the first store; and copying a portion of at least one chunk of the remaining chunks into the second store in response to determining that the location of the write request is within the deduplicated chunks. 5. A method comprising: selecting, by at least one processor, chunks of data for deduplication; deduplicating, by the at least one processor, the selected chunks of data to form deduplicated chunks, the deduplicating comprising removing given chunks of the selected chunks of data and replacing the given chunks with respective pointers to the given chunks, the deduplicated chunks comprising the pointers and remaining chunks of the selected chunks of data, the remaining chunks excluding the given chunks that have been removed; sequentially storing the deduplicated chunks including the pointers and the remaining chunks in a first store; maintaining a second store to store data that is enabled for random access; receiving a write request for data at a random location within a sequence of stored data; determining if the random location is within the deduplicated chunks in the first store; copying a portion of a chunk of the remaining chunks into the second store in response to determining that the random location is within the deduplicated chunks; receiving a read request for data at a random location within the sequence of stored data; determining that the random location of the read request is within the first store; determining whether at least a part of the random location of the read request is within the second store in response to determining that the random location of the read request is within the first store; accessing data of the second store in response to determining that the at least a part of the random location of the read request is within the second store; and accessing data of the first store in response to determining that the at least a part of the random location of the read request is not within the second store. 6. The method of claim 5 , further comprising: calculating a CRC (cyclic redundancy check) for a block of data; and storing the calculated CRC. 7. The method of claim 6 , further comprising: upon reconstituting deduplicated data, calculating a CRC for each block of data in the reconstituted deduplicated data; comparing the calculated CRC for each block of data in the reconstituted deduplicated data with a previously stored CRC for a corresponding block of data; and indicating an error if a mismatch occurs in the comparing. 8. A system comprising: at least one processor; a deduplication engine executable on the at least one processor to select chunks of data for deduplication and to deduplicate the selected chunks of data to form deduplicated chunks, the deduplicating comprising removing a given chunk of the selected chunks of data and replacing the given chunk with a pointer to the given chunk, the deduplicated chunks comprising the pointer and remaining chunks of the selected chunks of data, the remaining chunks excluding the given chunk that has been removed; a first store to store the deduplicated chunks including the pointer and the remaining chunks in a sequential format; a second store to store a copy of at least a portion of at least one chunk of the remaining chunks in a manner to allow random access; and program instructions executable on the at least one processor to: receive a read request for data at a location within a sequence of stored data; determine that the location of the read request is within the first store; determine whether at least a part of the location of the read request is within the second store in response to determining that the location of the read request is within the first store; access data of the second store in response to determining that the at least a part of the location of the read request is within the second store; and access data of the first store in response to determining that the at least a part of the location of the read request is not within the second store. 9. The system of claim 8 , further comprising: a VTL (virtual tape library) interface to interface with a VTL host computer system; and a NAS (network attached storage) interface to interface with a NAS host computer system. 10. The system of claim 9 , wherein the NAS interface comprises: a FUSE (file system in user space) layer to interface with NAS data sources in user space; and a buffer manager to interface with the deduplication engine. 11. The system of claim 9 , wherein each of the VTL interface and the NAS interface comprises a CRC (cyclic redundancy check) calculator to calculate a CRC for each block of data. 12. The system of claim 8 , wherein the program instructions are executable on the at least one processor to: receive a write request for data at a location within the sequence of stored data; determine that the location of the write request is within the deduplicated chunks in the first store; and copy a portion of at least one chunk of the remaining chunks into the second store in response to determining that the location of the write request is within the deduplicated chunks. 13. A non-transitory computer readable medium storing instructions that when executed cause a computer system to: select chunks of data for deduplication; deduplicate the selected chunks of data to form deduplicated chunks, the deduplicating comprising removing given chunks of the selected chunks of data and replacing the given chunks with respective pointers to the given chunks, the deduplicated chunks comprising the pointers and remaining chunks of the selected chunks of data, the remaining chunks excluding the given chunks that have been removed; sequentially store the deduplicated chunks

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • De-duplication techniques · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Saving storage space on storage systems · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9740704B2 cover?
A deduplication engine is operable to select at least two chunks of data for deduplication and deduplicate the selected at least two chunks of data. A first store is operable to store the deduplicated chunks of data in a sequential manner, and a second store is operable to store at least a portion of at least one chunk of the deduplicated data in a manner to allow random access, where data is a…
Who is the assignee on this patent?
Slater Alastair, Pelly Simon, Brady Garry, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).