Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites

US10248657B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10248657-B2
Application numberUS-201615258252-A
CountryUS
Kind codeB2
Filing dateSep 7, 2016
Priority dateJun 30, 2009
Publication dateApr 2, 2019
Grant dateApr 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, are performed within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet loss, using various network protocols, including HTTP and FTP. Methods are disclosed for content indexing data stored within a cloud environment to facilitate later searching, including collaborative searching. Methods are also disclosed for performing containerized deduplication to reduce the strain on a system namespace, effectuate cost savings, etc. Methods are disclosed for identifying suitable storage locations, including suitable cloud storage sites, for data files subject to a storage policy. Further, systems and methods for providing a cloud gateway and a scalable data object store within a cloud environment are disclosed, along with other features.

First claim

Opening claim text (preview).

We claim: 1. A method for storing a secondary copy, of an original data set, on a cloud storage site using a cloud gateway, wherein the cloud gateway is coupled between multiple computers and one or more cloud storage sites via a network, the method comprising: identifying data blocks within a cache of the cloud gateway that satisfy certain criteria, wherein the original data set comprises data blocks, and wherein the certain criteria are from a storage policy; performing block-level deduplication of the identified data blocks to create a deduplicated set of data, wherein the block-level deduplication includes— determining a size for a container file to utilize when deduplicating the identified data blocks; and deduplicating at least some of the identified data blocks to create one or more container files containing deduplicated data, wherein at least one of the container files has the determined size; and storing the deduplicated set of data on the cloud storage site by: buffering data, to a data buffer, for transmission to the cloud storage site; repeating the following steps while the data buffer is not full: receiving a file system request to write a group of data to the cloud storage site; and adding the group of data to the buffer; converting a file system request to one or more application program interface calls associated with the cloud storage site; and transmitting contents of the data buffer to the cloud storage site using the one or more application program interface calls associated with the cloud storage site. 2. The method of claim 1 , further comprising identifying the cloud storage site on which to store the secondary copy of the original data set by: identifying two or more candidate cloud storage sites; accessing a storage policy having a set of preferences and storage criteria, wherein the set of preferences and storage criteria includes at least two of the following: one or more preferred cloud storage sites, one or more preferred classes or quality of cloud storage sites, requirements regarding deduplication of the original data set, requirements regarding encryption of the original data set, requirements regarding compression of the original data set, quality of a network connection available to the cloud storage site, one or more data retention periods, data characteristics of at least some data in the original data set, estimated or historic usage associated with operating one or more system components, frequency with which the original data set was accessed or modified during a particular time period, a specified level of fault tolerance, or one or more geographical locations or political states in which data storage devices for a cloud storage site exist; and selecting at least one of the two or more of the candidate cloud storage sites based at least in part on the set of preferences and storage criteria in the storage policy. 3. The method of claim 1 wherein the contents of the data buffer are transmitted to the cloud storage site using at least one of hypertext transfer protocol (HTTP) and HTTP over Transport Layer Security/Secure Sockets Layer. 4. The method of claim 1 wherein the certain criteria include time-based criteria. 5. A system for creating a secondary copy of an original data set using a cloud storage site, the system comprising a memory and processor that are configured to: identify sub-objects of the original data set that satisfy certain criteria, wherein the certain criteria are related a storage policy, and wherein the original data set is received from one or more client computers; perform deduplication of the identified data sub-objects to create a deduplicated set of data; and, forward the deduplicated set of data to the cloud storage site, wherein the forwarding includes: converting file system requests into application program interface calls associated with the cloud storage site; and, forwarding the data to the cloud storage site using the one or more application program interface calls associated with the cloud storage site. 6. The system of claim 5 , wherein the memory and processor are further configured to: determine a size for a container file and for deduplicating at least some of the data sub-objects to create one or more container files containing deduplicated data, wherein at least one of the container files has the determined size. 7. The system of claim 5 , wherein the forwarding further includes: buffering data, to a data buffer, for transmission to the cloud storage site by: receiving a file system request to write a group of data to the cloud storage site; and adding the group of data to the data buffer. 8. The system of claim 5 , wherein the certain criteria include time-based criteria, wherein the deduplication includes block-level deduplication, and wherein the block-level deduplication includes— determining a size for a container file to utilize when deduplicating the identified data blocks; and deduplicating at least some of the identified data blocks to create one or more container files containing deduplicated data, wherein at least one of the container files has the determined size; and wherein the container file is forwarded to the cloud storage site. 9. The system of claim 5 , wherein the forwarding further includes: buffering data, to a data buffer, for transmission to the cloud storage site by repeating the following steps while the data buffer is not full: receiving a file system request to write a group of data to the cloud storage site; and adding the group of data to the buffer. 10. A computer-implemented method for copying multiple files at a cloud storage site, wherein the cloud storage site is coupled to a computer executing a file system for accessing a secondary storage computing device, the method comprising: receiving a copy operation request to copy n number of files at the cloud storage site, wherein each of the n number of files includes metadata and data, and wherein the n number of files exceeds a threshold; establishing a container size determined by one or more factors processing the n number of files by— copying the metadata of each of the n number of files to a first container; copying at least a portion of the data for the n number of files into a second container, wherein the second container is separate from the first container; and updating a data structure, wherein the data structure— tracks, for each of the n number of files, a location of the metadata for that file in the first container, and tracks, for the at least a portion of the data for the n number of files, a location of the data in the second container. 11. The computer-implemented method of claim 10 wherein the threshold is a number of files that the file system can operate on without system degradation. 12. The computer-implemented method of claim 10 wherein the threshold is related to at least of one of the factors. 13. The computer-implemented method of claim 10 wherein the factors include at least one of: a latency associated with a network connection to the cloud storage site, or a bandwidth associated with a network connection to the cloud storage site, or whether the cloud storage site imposes a restriction on a namespace associated with the computer or the file system, or whether the cloud storage site permits sparsification of data files, or a pricing structure associated with the cloud storage site, or a maximum specified container file size, or a minimum specified container file size. 14. The computer-implemented method of claim 10 w

Assignees

Inventors

Classifications

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

  • Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes · CPC title

  • based on web technology, e.g. hypertext transfer protocol [HTTP] · CPC title

  • Electronic negotiation · CPC title

  • Price or cost determination based on market factors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10248657B2 cover?
Data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, are performed within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet los…
Who is the assignee on this patent?
Commvault Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30156. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).