Backup operations for large databases using live synchronization

US2018285201A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018285201-A1
Application numberUS-201815937796-A
CountryUS
Kind codeA1
Filing dateMar 27, 2018
Priority dateMar 28, 2017
Publication dateOct 4, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for performing backup and other secondary copy operations for large databases (e.g., “big data”), such as the Greenplum database, are described. In some cases, the systems and methods may maintain a second instance of a source database (e.g., Greenplum) using live synchronization (e.g., “Live Sync”), which performs incremental replication between a virtual machine containing a large database (e.g., a virtual machine containing a Greenplum database) and a synced copy of the virtual machine.

First claim

Opening claim text (preview).

What is claimed: 1 . A method for maintaining a secondary copy of a large database at a virtual machine, the method comprising: performing a full backup of a primary copy of the large database, wherein the large database is running at a source virtual machine; identifying, based on metadata associated with the full backup of the primary copy of the large database, objects of the database that have changed since an initial synchronization of the large database between the primary copy at the source virtual machine and a secondary copy running at a destination virtual machine; restoring the identified objects of the large database that have changed since the initial synchronization of the large database using the full backup of the primary copy of the large database; and replicating the restored objects to the secondary copy of the large database contained at the destination virtual machine using live synchronization between the source virtual machine and the destination virtual machine. 2 . The method of claim 1 , further comprising: before performing the full backup performing a full synchronization between the primary copy of the large database at the source virtual machine and the secondary copy of the large database at the destination virtual machine. 3 . The method of claim 1 , wherein the large database is a Greenplum database, and wherein identifying objects of the large database that have changed since an initial synchronization of the large database includes identifying append only tables of the Greenplum database that have changed since the initial synchronization. 4 . The method of claim 1 , further comprising: performing one or more incremental backups after performance of the full backup of the primary copy of the large database; wherein identifying objects of the database that have changed since the initial synchronization of the large database includes identifying, within metadata associated with the one or more incremental backups of the primary copy of the large database, additional objects of the database that have changed since the performance of the full backup. 5 . The method of claim 1 , wherein replicating the restored objects to the secondary copy of the large database contained at the destination virtual machine using live synchronization includes performing continuous data replication on the restored objects during a running live synchronization. 6 . The method of claim 1 , wherein replicating the restored objects to the secondary copy of the large database contained at the destination virtual machine using live synchronization includes performing block-level replication on the restored objects during a running live synchronization. 7 . The method of claim 1 , further comprising: after identifying objects of the large database that have changed since an initial synchronization of the large database, updating entries of a changes index associated with a synchronization system to include information representative of the identified objects. 8 . The method of claim 1 , wherein performing a full backup of a primary copy of the large database includes performing a backup of a catalog of objects have changed within the large database that is managed by the large database. 9 . The method of claim 1 , wherein the full backup is performed on a daily schedule, and the replication is performed on a weekly schedule. 10 . The method of claim 1 , wherein replicating the restored objects to the secondary copy of the large database contained at the destination virtual machine including replication the restored objects includes replicating the restored objects using an enhanced data agent that is specific to the large database and installed at the source virtual machine. 11 . A system, comprising: at least one processor; at least one data storage device coupled to the at least one processor and storing instructions for implementing a process to maintain a secondary copy of a large database at a virtual machine, wherein the process comprises: performing a full backup of a primary copy of the large database at a source virtual machine, identifying, within metadata associated with the full backup of the primary copy of the large database, objects of the database that have changed since an initial synchronization of the large database between the primary copy at the source virtual machine and a secondary copy at a destination virtual machine, restoring the identified objects of the large database that have changed since the initial synchronization of the large database using the full backup of the primary copy of the large database; and replicating the restored objects to the secondary copy of the large database at the destination virtual machine using live synchronization between the source virtual machine and the destination virtual machine. 12 . The system of claim 11 , wherein the process further comprises: before performing the full backup performing a full synchronization between the primary copy of the large database at the source virtual machine and the secondary copy of the large database at the destination virtual machine. 13 . The system of claim 11 , wherein the large database is a Greenplum database, and wherein identifying objects of the large database that have changed since an initial synchronization of the large database includes identifying append only tables of the Greenplum database that have changed since the initial synchronization. 14 . The system of claim 11 , wherein replicating the restored objects to the secondary copy of the large database contained at the destination virtual machine using live synchronization includes performing continuous data replication on the restored objects during a running live synchronization. 15 . The system of claim 11 , wherein replicating the restored objects to the secondary copy of the large database contained at the destination virtual machine using live synchronization includes performing block-level replication on the restored objects during a running live synchronization. 16 . The system of claim 11 , wherein the process further comprises: after identifying objects of the large database, that have changed since an initial synchronization of the large database, updating entries of a changes index associated with a synchronization system to include information representative of the identified objects. 17 . The system of claim 11 , wherein performing a full backup of a primary copy of the large database includes performing a backup of a catalog of objects have changed within the large database that is managed by the large database. 18 . The system of claim 11 , wherein replicating the restored objects to the secondary copy of the large database contained at the destination virtual machine including replication the restored objects includes replicating the restored objects using an enhanced data agent that is specific to the large database and installed at the source virtual machine. 19 . A computer readable medium, excluding transitory propagating signals, storing instructions that, when executed by an information management system, cause the information management system to maintain synchronization between a Greenplum database stored at a source virtual machine and an instance of the Greenplum database stored at a destination virtual machine, the method comprising: creating a backup copy of the Greenplum database; identifying from the backup copy one or more objects of the Greenplum database that have changed since an ini

Assignees

Inventors

Classifications

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Memory management, e.g. access or allocation · CPC title

  • Physics · mapped topic

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • Hypervisor-specific management and integration aspects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018285201A1 cover?
Systems and methods for performing backup and other secondary copy operations for large databases (e.g., “big data”), such as the Greenplum database, are described. In some cases, the systems and methods may maintain a second instance of a source database (e.g., Greenplum) using live synchronization (e.g., “Live Sync”), which performs incremental replication between a virtual machine containing…
Who is the assignee on this patent?
Commvault Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/1461. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).