System and method for storing redundant information

US10061535B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10061535-B2
Application numberUS-201614992408-A
CountryUS
Kind codeB2
Filing dateJan 11, 2016
Priority dateDec 22, 2006
Publication dateAug 28, 2018
Grant dateAug 28, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more data objects to which to apply a storage operation. For each data object, the storage system determines if the data object contains data that matches another data object to which the storage operation was previously applied. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation.

First claim

Opening claim text (preview).

We claim: 1. A method performed by a computer system of storing a single-instance copy on a sequential storage medium, wherein the single instance copy is created from copies of original data objects, the method comprising: receiving or accessing multiple data objects from a computer network; wherein some of the multiple data objects are substantially identical according to a hashing algorithm; storing, on a random-access storage medium, a single-instance copy of the multiple data objects; wherein the single-instance copy contains a copy of only one of the substantially identical data objects; and wherein the random-access storage medium includes at least one reference to the copy of the only one of the substantially identical data objects; storing the single-instance copy of the multiple data objects on a sequential storage medium by: transferring the copy of the only one of the substantially identical data objects from the random-access storage medium to the sequential storage medium; and transferring the at least one reference to the copy of the substantially identical data objects from the random-access storage medium to the sequential storage medium after the copy of the only one of the substantially identical data objects is stored on the sequential storage medium. 2. The method of claim 1 , wherein receiving or accessing multiple data objects from a computer network includes receiving or accessing multiple data objects from multiple, different logical locations within a computer network, wherein the single-instance copy contains information identifying the logical locations, and wherein transferring the at least one reference to the copy of the substantially identical data objects from the random-access storage medium to the sequential storage medium after the copy of the only one of the substantially identical data objects is stored on the sequential storage medium includes storing the information identifying the logical locations. 3. The method of claim 1 , wherein storing the at least one reference to the copy of the substantially identical data objects from the random-access storage medium to the sequential storage medium after the copy of the only one of the substantially identical data objects is stored on the sequential storage medium includes storing a reference count to track a number of references that refer to the copy of one of the substantially identical data objects. 4. The method of claim 1 , wherein a reference to the copy of one of the substantially identical data objects comprises a media identifier identifying a storage medium on which the copy is stored and an offset within the identified storage medium to the copy. 5. The method of claim 1 , further comprising maintaining an index on the random-access storage medium, wherein the index comprises, for each of the multiple data objects: an identifier of the data object, information indicating whether the data object is stored as a copy or a reference to a copy, and an identifier of a source copy when the data object is stored as a reference to the source copy. 6. The method of claim 1 , wherein storing, on the random-access storage medium, the single-instance copy of the multiple data objects includes storing the single-instance copy of the multiple data objects using an index that comprises, for each of the multiple data objects: an identifier of the data object, information indicating whether the data object is stored as a copy or a reference to a copy, and an identifier of a source copy when the data object is stored as a reference to the source copy. 7. The method of claim 1 , wherein receiving or accessing multiple data objects from a computer network includes receiving multiple data objects from multiple, different logical locations within a computer network, and wherein the single-instance copy contains information identifying the logical locations, further comprising: transferring, before transferring the information identifying the logical locations on the sequential storage medium, an index that comprises, for each of the multiple data objects: an identifier of the data object, information indicating whether the data object is stored as a copy or a reference to a copy, and an identifier of a source copy when the data object is stored as a reference to the source copy. 8. The method of claim 1 , wherein at least some of the multiple data objects are of different types or formats, and wherein the different types or formats correspond to documents, email messages, and configuration settings. 9. A method performed by a computer system of storing a de-duplicated copy of data objects on a sequential storage medium, comprising: receiving one or more data objects in a hierarchy, wherein some of the data objects are identified as identical based on hashing; storing, on a random-access storage medium, a de-duplicated copy of the one or more data objects, wherein the de-duplicated copy contains information describing— a first instance of each of the one or more data objects, and one or more references to the one or more first instances as stored on the random-access storage medium; and transferring the de-duplicated copy of the one or more data objects from the random-access storage medium to a sequential storage medium for storage on the sequential storage medium. 10. The method of claim 9 , wherein the data objects are of at least two different object types, and wherein the different types or formats correspond to documents, email messages, or configuration settings. 11. The method of claim 9 , wherein at least one of the data objects has an archive file format. 12. A non-transitory computer-readable medium containing instructions for controlling a computer system to execute a method of storing a copy of data objects on a sequential storage medium, the method comprising: receiving or accessing multiple data objects from a computer network; wherein some of the multiple data objects are substantially identical according to a hashing algorithm; storing, on a random-access storage medium, a single-instance copy of the multiple data objects; wherein the single-instance copy contains a copy of only one of the substantially identical data objects; and wherein the random-access storage medium includes at least one reference to the copy of the only one of the substantially identical data objects; storing the single-instance copy of the multiple data objects on a sequential storage medium by: transferring the copy of the only one of the substantially identical data objects from the random-access storage medium to the sequential storage medium; and transferring the at least one reference to the copy of the substantially identical data objects from the random-access storage medium to the sequential storage medium after the copy of the only one of the substantially identical data objects is stored on the sequential storage medium. 13. The non-transitory computer-readable medium of claim 12 , wherein receiving or accessing multiple data objects from a computer network includes receiving or accessing multiple data objects from multiple, different logical locations within a computer network, wherein the single-instance copy contains information identifying the logical locations, and wherein transferring the at least one reference to the copy of the substantially identical data objects from the random-access storage medium to the sequential storage medium after the copy of the only one of the substantially identical data objects is stored on the sequential storage medium includes storing the information identifying the logical locations.

Assignees

Inventors

Classifications

  • Re-recording, i.e. transcribing information from one magnetisable record carrier on to one or more similar or dissimilar record carriers {(by varying the order of the information G11B27/029, G11B27/036)} · CPC title

  • using de-duplication of the data · CPC title

  • Organizing or formatting or addressing of data · CPC title

  • Improving I/O performance · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10061535B2 cover?
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more data objects to which to apply a storage operation. For each data object, the storage system determines if the data object contains data that matches another data object to which the storage operation was prev…
Who is the assignee on this patent?
Commvault Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/1453. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 28 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).