Real-time replication of database management system transactions into a data lakehouse

US2025371029A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025371029-A1
Application numberUS-202418679600-A
CountryUS
Kind codeA1
Filing dateMay 31, 2024
Priority dateMay 31, 2024
Publication dateDec 4, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing system performs real-time replication of database management system transactions. The computing system includes a source DBMS, a replication service and a data lakehouse. The source DBMS stores at least one source transaction table recording at least one source transaction and generates at least one recovery log indicating at least one modification in the at least one source transaction table. The replication service replicates the at least one source transaction recorded in the at least one source transaction table by generating at least one data file having a first data format. The data lakehouse stores at least one lakehouse table corresponding to the at least one source transaction table and having a second data format different from the first data format, and to modify the at least one lakehouse table based on the at least one data file.

First claim

Opening claim text (preview).

1 . A computing system configured to perform real-time replication of database management system transactions, the computing system comprising: a source database management system (DBMS) configured to store at least one source transaction table recording at least one source transaction and to generate at least one recovery log indicating at least one modification in the at least one source transaction table; a replication service in signal communication with the source DBMS, the replication service including a first controller configured to replicate the at least one source transaction recorded in the at least one source transaction table by generating at least one data file having a first data format; a data lakehouse in signal communication with the replication service, the data lakehouse including a second controller configured to store at least one lakehouse table corresponding to the at least one source transaction table and having a second data format different from the first data format, and to modify the at least one lakehouse table based on the at least one data file, wherein the replication service propagates the at least one source transaction into the data lakehouse based on the at least one data file to generate at least one propagated source transaction; and wherein the data lakehouse generates the at least one lakehouse table including pointer that points to the at least one propagated source transaction propagated in the data lakehouse. 2 . The computing system of claim 1 , wherein: the replication service comprises an open format builder (OFB) configured to determine metadata corresponding to the at least one source transaction and to generate the at least one data file having an open data format; and the data lakehouse comprises an open table format (OTF) server configured to receive the metadata from the OFB and to modify the at least one lakehouse table based on the metadata. 3 . The computing system of claim 2 , wherein the replication service further comprises: a Capture module configured to extract the at least one source transaction from the at least one recovery log; an Apply module configured to receive the extracted at least one source transaction from the Capture module and to generate a parallel feed of independent batches of transactions corresponding to the at least one source transaction. 4 . The computing system of claim 3 , wherein the OFB is configured to determine at least one DBMS row change in the source transaction table associated based on the batches of transactions and to construct the at least one data file having the open data format indicating at least one row change in the source transaction table. 5 . The computing system of claim 4 , wherein the OFB adds timestamp data into the at least one data file having the open data format. 6 . The computing system of claim 5 , wherein adding the timestamp data includes creating a new column in the at least one source transaction table and inputting the timestamp data into the new column. 7 . The computing system of claim 4 , wherein the data lakehouse further comprises: an object storage unit configured to store the at least one data file having the open data format; and a metadata store configured to store the at least one lakehouse table. 8 . The computing system of claim 7 , wherein the metadata is stored in the at least one lakehouse table as listing pointers that point to the at least one data file stored in the object storage unit. 9 . A computer-implemented method of performing real-time replication of database management system transactions into a data lakehouse, the method comprising: storing in a source database management system (DBMS) at least one source transaction table recording at least one source transaction; generating at least one recovery log indicating at least one modification in the at least one source transaction table; generating at least one data file having a first data format to replicate the at least one source transaction recorded in the at least one source transaction table; propagating the at least one source transaction into the data lakehouse based on the at least one data file to generate at least one propagated source transaction; generating at least one lakehouse table including pointer that points to the at least one propagated source transaction propagated in the data lakehouse, the at least one lakehouse table corresponding to the at least one replicated source transaction table and having a second data format different from the first data format, and storing at least one lakehouse table in the data lakehouse; and modifying the at least one lakehouse table based on the at least one data file. 10 . The computer-implemented method of claim 9 , further comprising: determining by an open format builder (OFB) metadata corresponding to the at least one source transaction and to generate the at least one data file having an open data format; receiving by an open table format (OTF) server the metadata from the OFB; and modifying the at least one lakehouse table based on the metadata. 11 . The computer-implemented method of claim 10 , further comprising: extracting by a Capture module c the at least one source transaction from the at least one recovery log; receiving by an Apply module the extracted at least one source transaction from the Capture module; and generating by the Apply module a parallel feed of independent batches of transactions corresponding to the at least one source transaction. 12 . The computer-implemented method of claim 11 , further comprising: determining by the OFB at least one DBMS row change in the source transaction table associated based on the batches of transactions; and constructing the at least one data file having the open data format indicating at least one row change in the source transaction table. 13 . The computer-implemented method of claim 12 , further comprising adding timestamp data into the at least one data file having the open data format. 14 . The computer-implemented method of claim 13 , wherein adding the timestamp data includes creating a new column in the at least one source transaction table and inputting the timestamp data into the new column. 15 . The computer-implemented method of claim 12 , further comprising: storing the at least one data file having the open data format in an object storage unit; and storing the at least one lakehouse table in a metadata store. 16 . The computer-implemented method of claim 15 , wherein the metadata is stored in the at least one lakehouse table as listing pointers that point to the at least one data file stored in the object storage unit. 17 . A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform operations comprising: storing in a source database management system (DBMS) at least one source transaction table recording at least one source transaction; generating at least one recovery log indicating at least one modification in the at least one source transaction table; generating at least one data file having a first data format to replicate the at least one source transaction recorded in the at least one source transaction table; propagating the at least one source transaction into the data lakehouse based on the at least one data file to generate at least one propagated source transaction; generating at least one lakehouse table including pointer that points to t

Assignees

Inventors

Classifications

  • Updates performed during online database operations; commit processing · CPC title

  • G06F16/273Primary

    Asynchronous replication or reconciliation · CPC title

  • Synchronous replication · CPC title

  • Data format conversion from or to a database · CPC title

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025371029A1 cover?
A computing system performs real-time replication of database management system transactions. The computing system includes a source DBMS, a replication service and a data lakehouse. The source DBMS stores at least one source transaction table recording at least one source transaction and generates at least one recovery log indicating at least one modification in the at least one source transac…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/273. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).