Replicating Big Data

US2017032012A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017032012-A1
Application numberUS-201514813181-A
CountryUS
Kind codeA1
Filing dateJul 30, 2015
Priority dateJul 30, 2015
Publication dateFeb 2, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes identifying a first table including data. The first table has associated metadata, an associated replication state, an associated replication log file including replication logs logging mutations of the first table, and an associated replication configuration file including a first association that associates the first table with a replication family. The method includes inserting a second association in the replication configuration file that associates a second table having a non-loadable state with the replication family. The association of the second table with the replication family causes persistence of any replication logs in the replication log file that correspond to any mutations of the first table during the existence of the second table. The method further includes generating a third table from the first table, the metadata associated with the first table, and the associated replication state of the first table.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: identifying, by data processing hardware, a first table comprising data, the first table having associated metadata defining the first table, an associated replication state, an associated replication log file comprising replication logs logging mutations of the first table, and an associated replication configuration file including a first association that associates the first table with a replication family; inserting, by the data processing hardware, a second association in the replication configuration file that associates a second table having a non-loadable state with the replication family, the association of the second table with the replication family causing persistence of any replication logs in the replication log file that correspond to any mutations of the first table during the existence of the second table; generating, by the data processing hardware, a third table from the first table, the metadata associated with the first table, and the associated replication state of the first table; inserting, by the data processing hardware, a third association in the replication configuration file that associates the third table with the replication family; and applying, by the data processing hardware, to the third table the mutations of the first table from any replication logs logged in the replication log file during the existence of the second table. 2 . The method of claim 1 , further comprising removing, by the data processing hardware, the second association from the replication configuration file, the removal of the second association removing the persistence of any replication logs in the replication log file that correspond to any mutations of the first table during the existence of the second table. 3 . The method of claim 1 , further comprising: generating, by the data processing hardware, the second table with the non-loadable state before inserting the second association in the replication configuration file; and/or deleting, by the data processing hardware, the second table after removing the second association from the replication configuration file. 4 . The method of claim 2 , wherein applying to the third table the mutations of the first table of any replication logs logged in the replication log file during the existence of the second table comprises, before removing the second association from the replication configuration file: identifying, as transient mutations, any mutations to the first table logged in the associated replication log file that occurred after generating the second table and inserting the second association in the replication configuration file and before a completion of generating the third table; and applying the transient mutations to the third table. 5 . The method of claim 1 , wherein the replication configuration file comprises replication logs for mutations to all tables associated with the replication family. 6 . The method of claim 1 , further comprising applying, by the data processing hardware, to the third table the mutations of replication logs logged for each table in the replication log file during the existence of the second table. 7 . The method of claim 1 , wherein the associated replication state of the first table comprises a logical clock indicating a most recent time when all mutations of a source table were applied to the first table, the first table being a replication of the source table. 8 . The method of claim 7 , wherein generating the third table comprises: copying a sorted string table file representing the first table; copying a metadata file comprising the metadata associated with the first table; copying a replication state file comprising the first table replication state, and storing in memory hardware in communication with the data processing hardware, the copied sorted string table file, the copied metadata file, and the copied replication state file. 9 . The method of claim 8 , wherein the sorted string table comprises a map from keys to values of the first database table, the keys and values being arbitrary byte strings, the keys representing a row and a column, and the value representing data stored in a cell defined by the row and the column. 10 . The method of claim 1 , wherein the non-loadable state of the second table prevents loading data into the second table and processing replication logs of any tables associated with the replication family in the replication configuration file. 11 . A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: identifying a first table comprising data, the first table having associated metadata defining the first table, an associated replication state, an associated replication log file comprising replication logs logging mutations of the first table, and an associated replication configuration file including a first association that associates the first table with a replication family; inserting a second association in the replication configuration file that associates a second table having a non-loadable state with the replication family, the association of the second table with the replication family causing persistence of any replication logs in the replication log file that correspond to any mutations of the first table during the existence of the second table; generating a third table from the first table, the metadata associated with the first table, and the associated replication state of the first table, inserting a third association in the replication configuration file that associates the third table with the replication family; and applying to the third table the mutations of the first table from any replication logs logged in the replication log file during the existence of the second table. 12 . The system of claim 11 , wherein the operations further comprise removing the second association from the replication configuration file the removal of the second association removing the persistence of any replication logs in the replication log file that correspond to any mutations of the first table during the existence of the second table. 13 . The system of claim 11 , wherein the operations further comprise: generating the second table with the non-loadable state before inserting the second association in the replication configuration file; and/or deleting the second table after removing the second association from the replication configuration file. 14 . The system of claim 12 , wherein applying to the third table the mutations of the first table of any replication logs logged in the replication log file during the existence of the second table comprises, before removing the second association from the replication configuration file: identifying, as transient mutations, any mutations to the first table logged in the associated replication log file that occurred after generating the second table and inserting the second association in the replication configuration file and before a completion of generating the third table; and applying the transient mutations to the third table. 15 . The system of claim 11 , wherein the replication configuration file comprises replication logs for mutations to all tables associated with the replication family. 16 . The system of claim 11 , wherein the operations further comprise applying to the third table the mut

Assignees

Inventors

Classifications

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • G06F16/275Primary

    Synchronous replication · CPC title

  • Integrating or interfacing systems involving database management systems · CPC title

  • Distributed file systems · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017032012A1 cover?
A method includes identifying a first table including data. The first table has associated metadata, an associated replication state, an associated replication log file including replication logs logging mutations of the first table, and an associated replication configuration file including a first association that associates the first table with a replication family. The method includes inser…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/275. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 02 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).