Integrating object-based data integration tool with a version control system in centralized and decentralized environments
US-2019065568-A1 · Feb 28, 2019 · US
US2020201831A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020201831-A1 |
| Application number | US-201816230769-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 21, 2018 |
| Priority date | Dec 21, 2018 |
| Publication date | Jun 25, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A workbook management system provides a master branch of a data pipeline comprising a pointer(s) to a snapshot(s) of an initial dataset(s), a first logic, and a pointer(s) to a snapshot(s) of a first derived dataset(s) resulting from applying the first logic to the initial dataset(s). Responsive to user input requesting a test branch corresponding to the master branch, the system creates the test branch comprising the pointer(s) to the snapshot(s) of the initial dataset(s) and a copy of the first logic. The system receives a request to modify the test branch comprising at least one change to the copy of the first logic, and modifies the test branch independently of the master branch to include second logic reflecting the at least one change to the copy of the first logic, the pointer(s) to the snapshot(s) of the initial dataset(s), and a pointer(s) to snapshot(s) of a second derived dataset(s) resulting from applying the second logic to the initial dataset(s). Responsive to user input requesting a merge of the modified test branch into the master branch, the system updates the master branch to replace the first logic with the second logic and to replace the pointer(s) to the snapshot(s) of the first derived dataset(s) with the pointer(s) to the snapshot(s) of the second derived dataset(s).
Opening claim text (preview).
1 . A method comprising: receiving user input selecting a data transformation operation of a data pipeline comprising ordered data transformation operations each having a corresponding logic to be applied to one or more initial datasets to produce one or more derived datasets, wherein each time the corresponding logic is executed, a snapshot of each of the one or more derived datasets is stored in a data store in association with the corresponding logic; identifying a master branch for the selected data transformation operation of the data pipeline, the master branch having a master branch entry in a branch data structure, the master branch entry comprising a pointer to a snapshot of a first initial dataset, a first logic of the selected data transformation operation, and a pointer to a snapshot of a first derived dataset resulting from applying the first logic to the first initial dataset; responsive to user input requesting a first test branch corresponding to the master branch, creating the first test branch having a first test branch entry in the branch data structure, the first test branch entry comprising the pointer to the snapshot of the initial dataset, and a first copy of the first logic; receiving a request to modify the first test branch, the request comprising at least one change to the first copy of the first logic; modifying the first test branch independently of the master branch to include second logic reflecting the at least one change to the first copy of the first logic, the second logic to be applied to the first initial dataset to produce a second derived dataset, wherein modifying the first test branch comprises updating the first logic with the second logic in the first test branch entry in the branch data structure; and responsive to user input requesting a merge of the modified first test branch into the master branch, updating the master branch entry in the branch data structure to replace the first logic with the second logic and to replace the pointer to the snapshot of the first derived dataset with a pointer to a snapshot of the second derived dataset, wherein the method is performed using one or more processors. 2 . The method of claim 1 , further comprising: prior to updating the master branch to replace the first logic with the second logic, determining one or more differences between the first logic and the second logic; generating an indication of the one or more differences between the first logic and the second logic; and receiving user input confirming that the one or more differences between the first logic and the second logic are approved. 3 . The method of claim 1 , further comprising: prior to updating the master branch to replace the pointer to the snapshot of the first derived dataset with the pointer to the snapshot of the second derived dataset, determining one or more differences between the first derived dataset and the second derived dataset; generating an indication of the one or more differences between the first derived dataset and the second derived dataset; and receiving user input confirming that the one or more differences between the first derived dataset and the second derived dataset are approved. 4 . The method of claim 1 , further comprising: responsive to user input requesting a second test branch corresponding to the master branch, creating the second test branch having a second test branch entry in the branch data structure, the second test branch entry comprising the pointer to the snapshot of the initial dataset, and a second copy of the first logic; receiving a request to modify the second test branch, the request comprising at least one change to the second copy of the first logic; modifying the second test branch independently of the master branch to include third logic reflecting the at least one change to the second copy of the first logic, the third logic to be applied to the first initial dataset to produce a third derived dataset, wherein modifying the second test branch comprises updating the second logic with the third logic in the second test branch entry in the branch data structure; and responsive to user input requesting a merge of the modified second test branch into the updated master branch, updating the updated master branch entry in the branch data structure to replace the second logic with the third logic and to replace the pointer to the snapshot of the second derived dataset with a pointer to a snapshot of the third derived dataset. 5 . The method of claim 4 , further comprising: prior to updating the updated master branch, determining whether a merge conflict exists between the second logic and the third logic; and responsive to determining that the merge conflict exists, receiving user input comprising a selection of the third logic to resolve the merge conflict. 6 . The method of claim 1 , further comprising: prior to updating the master branch to replace the pointer to the snapshot of the first derived dataset with the pointer to the snapshot of the second derived dataset, executing a data health check operation on the second derived dataset to determine whether the second derived dataset satisfies one or more conditions, the one or more conditions comprising a verification that a creation of the second derived dataset completed successfully and a verification that the second derived dataset is not stale. 7 . The method of claim 1 , further comprising: responsive to user input requesting protection of the modified first test branch, preventing other users from further modifying the modified first test branch; and responsive to a request from another user to further modify the modified first test branch, creating a child test branch associated with the modified first test branch, the child test branch comprising the pointer to the snapshot of the initial dataset and a copy of the second logic. 8 . The method of claim 7 , further comprising: responsive to updating the master branch, deleting the modified first test branch and associating the child test branch with the master branch. 9 . The method of claim 1 , wherein the first logic is part of the data pipeline, the data pipeline further comprising additional logic to apply to the first derived dataset to produce one or more first additional derived datasets, the method further comprising: replacing the first logic in the data pipeline with the second logic to derive the second derived dataset; applying the additional logic to the second derived dataset to derive one or more second additional derived datasets; identifying one or more differences between the one or more second additional derived datasets and the one or more first additional derived datasets; and generating an indication of the differences between the one or more second additional derived datasets and the one or more first additional derived datasets. 10 . (canceled) 11 . The method of claim 1 , further comprising: displaying, via a graphical user interface (GUI), a visual representation of the data pipeline, including a first graph corresponding to the master branch and a second graph corresponding to the first test branch, wherein the first graph includes a first node representing the initial dataset, a second node representing the first derived dataset, and a first edge connecting the first node and the second node, wherein the first edge references the first logic to be applied to the initial dataset in order to produce the first derived dataset, and wherein the second graph includes a third node representing the initial dataset, a fourth node representing the second derived dataset, and a second edge connecting the third node and the fourth node, w
Version control (security arrangements therefor G06F21/57); Configuration management · CPC title
Database-specific techniques · CPC title
Using snapshots, i.e. a logical point-in-time copy of the data · CPC title
Updates performed during online database operations; commit processing · CPC title
by selection of backup contents · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.