File immutability using a deduplication file system in a public cloud using new filesystem redirection
US-2024103978-A1 · Mar 28, 2024 · US
US9740704B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9740704-B2 |
| Application number | US-69526110-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 28, 2010 |
| Priority date | Jan 28, 2010 |
| Publication date | Aug 22, 2017 |
| Grant date | Aug 22, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A deduplication engine is operable to select at least two chunks of data for deduplication and deduplicate the selected at least two chunks of data. A first store is operable to store the deduplicated chunks of data in a sequential manner, and a second store is operable to store at least a portion of at least one chunk of the deduplicated data in a manner to allow random access, where data is accessed via the first and/or second store.
Opening claim text (preview).
The invention claimed is: 1. A method performed by at least one processor, comprising: selecting chunks of data for deduplication; deduplicating the selected chunks of data to form deduplicated chunks, the deduplicating comprising removing a given chunk of the selected chunks of data and replacing the given chunk with a pointer to the given chunk, the deduplicated chunks comprising the pointer and remaining chunks of the selected chunks of data, the remaining chunks excluding the given chunk that has been removed; sequentially storing the deduplicated chunks including the pointer and the remaining chunks in a sequential format in a first store; maintaining a second store that contains a copy of a portion of at least one chunk of the remaining chunks, the second store storing data in a manner to allow random access; and in response to a read request for data at a location within the sequence of stored data: determining that the location is within the first store; determining whether at least a part of the location is within the second store in response to determining that the location is within the first store; accessing data of the second store in response to determining that the at least a part of the location is within the second store; and accessing data of the first store in response to determining that the at least a part of the location is not within the second store. 2. The method of claim 1 , wherein each chunk of the remaining chunks comprises a plurality of buckets. 3. The method of claim 2 , wherein the portion of the at least one chunk of the remaining chunks contained in the second store comprises at least one bucket of the remaining chunks. 4. The method of claim 1 , further comprising: receiving a write request for data at a location within the sequence of stored data; determining that the location of the write request is within the deduplicated chunks in the first store; and copying a portion of at least one chunk of the remaining chunks into the second store in response to determining that the location of the write request is within the deduplicated chunks. 5. A method comprising: selecting, by at least one processor, chunks of data for deduplication; deduplicating, by the at least one processor, the selected chunks of data to form deduplicated chunks, the deduplicating comprising removing given chunks of the selected chunks of data and replacing the given chunks with respective pointers to the given chunks, the deduplicated chunks comprising the pointers and remaining chunks of the selected chunks of data, the remaining chunks excluding the given chunks that have been removed; sequentially storing the deduplicated chunks including the pointers and the remaining chunks in a first store; maintaining a second store to store data that is enabled for random access; receiving a write request for data at a random location within a sequence of stored data; determining if the random location is within the deduplicated chunks in the first store; copying a portion of a chunk of the remaining chunks into the second store in response to determining that the random location is within the deduplicated chunks; receiving a read request for data at a random location within the sequence of stored data; determining that the random location of the read request is within the first store; determining whether at least a part of the random location of the read request is within the second store in response to determining that the random location of the read request is within the first store; accessing data of the second store in response to determining that the at least a part of the random location of the read request is within the second store; and accessing data of the first store in response to determining that the at least a part of the random location of the read request is not within the second store. 6. The method of claim 5 , further comprising: calculating a CRC (cyclic redundancy check) for a block of data; and storing the calculated CRC. 7. The method of claim 6 , further comprising: upon reconstituting deduplicated data, calculating a CRC for each block of data in the reconstituted deduplicated data; comparing the calculated CRC for each block of data in the reconstituted deduplicated data with a previously stored CRC for a corresponding block of data; and indicating an error if a mismatch occurs in the comparing. 8. A system comprising: at least one processor; a deduplication engine executable on the at least one processor to select chunks of data for deduplication and to deduplicate the selected chunks of data to form deduplicated chunks, the deduplicating comprising removing a given chunk of the selected chunks of data and replacing the given chunk with a pointer to the given chunk, the deduplicated chunks comprising the pointer and remaining chunks of the selected chunks of data, the remaining chunks excluding the given chunk that has been removed; a first store to store the deduplicated chunks including the pointer and the remaining chunks in a sequential format; a second store to store a copy of at least a portion of at least one chunk of the remaining chunks in a manner to allow random access; and program instructions executable on the at least one processor to: receive a read request for data at a location within a sequence of stored data; determine that the location of the read request is within the first store; determine whether at least a part of the location of the read request is within the second store in response to determining that the location of the read request is within the first store; access data of the second store in response to determining that the at least a part of the location of the read request is within the second store; and access data of the first store in response to determining that the at least a part of the location of the read request is not within the second store. 9. The system of claim 8 , further comprising: a VTL (virtual tape library) interface to interface with a VTL host computer system; and a NAS (network attached storage) interface to interface with a NAS host computer system. 10. The system of claim 9 , wherein the NAS interface comprises: a FUSE (file system in user space) layer to interface with NAS data sources in user space; and a buffer manager to interface with the deduplication engine. 11. The system of claim 9 , wherein each of the VTL interface and the NAS interface comprises a CRC (cyclic redundancy check) calculator to calculate a CRC for each block of data. 12. The system of claim 8 , wherein the program instructions are executable on the at least one processor to: receive a write request for data at a location within the sequence of stored data; determine that the location of the write request is within the deduplicated chunks in the first store; and copy a portion of at least one chunk of the remaining chunks into the second store in response to determining that the location of the write request is within the deduplicated chunks. 13. A non-transitory computer readable medium storing instructions that when executed cause a computer system to: select chunks of data for deduplication; deduplicate the selected chunks of data to form deduplicated chunks, the deduplicating comprising removing given chunks of the selected chunks of data and replacing the given chunks with respective pointers to the given chunks, the deduplicated chunks comprising the pointers and remaining chunks of the selected chunks of data, the remaining chunks excluding the given chunks that have been removed; sequentially store the deduplicated chunks
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
De-duplication techniques · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Saving storage space on storage systems · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.