Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores
US-2015112998-A1 · Apr 23, 2015 · US
US10007674B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10007674-B2 |
| Application number | US-201615262207-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 12, 2016 |
| Priority date | Jun 13, 2016 |
| Publication date | Jun 26, 2018 |
| Grant date | Jun 26, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented system and method for data revision control in a large-scale data analytic systems. In one embodiment, for example, a computer-implemented method comprises the operations of storing a first version of a dataset that is derived by executing a first version of driver program associated with the dataset; and storing a first build catalog entry comprising an identifier of the first version of the dataset and comprising an identifier of the first version of the driver program.
Opening claim text (preview).
The invention claimed is: 1. A method for data revision control in a large-scale data analytic system: at one or more machines comprising one or more processors and memory storing one or more programs executed by the one or more processors to perform the method, performing operations comprising: storing a first version of a first dataset that is derived from a first version of a second dataset based on a first execution of a first version of a driver program; storing a first build catalog entry comprising an identifier of the first version of the first dataset, an identifier of the first version of the second dataset, a first branch name, and an identifier of the first version of the driver program; storing a second version of the first dataset that is derived from a second version of the second dataset based on a second execution of the first version of the driver program; storing a second build catalog entry comprising an identifier of the second version of the first dataset, an identifier of the second version of the second dataset, a second branch name that is different from the first branch name, and an identifier of the first version of the driver program; storing a first transaction entry in a database, the first transaction entry comprising a first transaction commit identifier of the first version of the first dataset; wherein the first build catalog entry comprises the first transaction commit identifier; storing a second transaction entry in the database, the second transaction entry comprising a second transaction commit identifier of the first version of the second dataset; wherein the identifier of the first version of the second dataset in the first build catalog entry is the second transaction commit identifier; storing a third transaction entry in the database, the third transaction entry comprising a third transaction commit identifier of the second version of the second dataset; wherein the identifier of the second version of the second dataset in the second build catalog entry is the third transaction commit identifier; and causing display of a provenance graph in a graphical user interface based on the first build catalog entry, the provenance graph display including display of: a first node representing the first version of the first dataset, a second node representing the first version of the second dataset, and a first directed edge from the first node to the second node. 2. The method of claim 1 , further comprising storing the first version of the first dataset in a distributed file system. 3. The method of claim 1 , wherein the identifier of the first version of the first dataset is an identifier assigned to a commit of a transaction in context of which the first version of the first dataset is stored. 4. The method of claim 1 , wherein the first version of the driver program, when executed to produce the first version of the first dataset, transforms data of the first version of the second dataset to produce data of the first version of the first dataset. 5. One or more non-transitory computer-readable media storing a set of instructions for execution by one or more processors, the set of instructions configured for performing operations comprising: storing a first version of a first dataset that is derived from a first version of a second dataset based on a first execution of a first version of a driver program; storing a first build catalog entry comprising an identifier of the first version of the first dataset, an identifier of the first version of the second dataset, a first branch name, and an identifier of the first version of the driver program; storing a second version of the first dataset that is derived from a second version of the second dataset based on a second execution of the first version of the driver program; storing a second build catalog entry comprising an identifier of the second version of the first dataset, an identifier of the second version of the second dataset, a second branch name that is different from the first branch name, and an identifier of the first version of the driver program; storing a first transaction entry in a database, the first transaction entry comprising a first transaction commit identifier of the first version of the first dataset; wherein the first build catalog entry comprises the first transaction commit identifier; storing a second transaction entry in the database, the second transaction entry comprising a second transaction commit identifier of the first version of the second dataset; wherein the identifier of the first version of the second dataset in the first build catalog entry is the second transaction commit identifier; storing a third transaction entry in the database, the third transaction entry comprising a third transaction commit identifier of the second version of the second dataset; wherein the identifier of the second version of the second dataset in the second build catalog entry is the third transaction commit identifier; and causing display of a provenance graph in a graphical user interface based on the second build catalog entry, the provenance graph display including display of: a first node representing the second version of the first dataset, a second node representing the second version of the second dataset, and a first directed edge from the first node to the second node. 6. The one or more non-transitory computer-readable media of claim 5 , wherein the operations further comprise storing the first version of the first dataset in a distributed file system. 7. The one or more non-transitory computer-readable media of claim 5 , wherein the identifier of the first version of the first dataset is an identifier assigned to a commit of a transaction in context of which the first version of the first dataset is stored. 8. The one or more non-transitory computer-readable media of claim 5 , wherein the first version of the driver program, when executed to produce the first version of the first dataset, transforms data of the first version of the second dataset to produce data of the first version of the first dataset.
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.