Version control interface supporting time travel access of a data lake
US-2023409545-A1 · Dec 21, 2023 · US
US2024143815A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024143815-A1 |
| Application number | US-202217975919-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 28, 2022 |
| Priority date | Oct 28, 2022 |
| Publication date | May 2, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Managing versioning of data objects for a project revised from a first version to a revised version by producing a dataset representing the data objects as a group by scanning the data objects to identify metadata of the grouped data to be processed similarly within a current version of the lifecycle, and storing the identified metadata in the dataset. Data object changed from the first version to the revised version are identified, and the corresponding metadata for changed data objects in the dataset is updated. A version control operation is then performed on the dataset to update all data objects referenced by the dataset from the first version to the revised version. A commit-map and commit-tree are stored in a repository, and version control operations including commit, checkout, merge, branch and merge-branch are performed on the dataset snapshot.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method of managing different versions of data objects for a version control system (VCS) during a lifecycle of the data objects, comprising: producing a dataset representing the data objects as a group by scanning the data objects to identify metadata of the grouped data to be processed similarly within a current version of the lifecycle, and storing the identified metadata in the dataset; identifying data objects that themselves are subject to a change from the current version to a next version during the lifecycle; updating corresponding metadata for changed data objects in the dataset; and applying a version control operation on the dataset to update all data objects referenced by the dataset from the current version to the next version. 2 . The method of claim 1 wherein the dataset is distributed across the plurality of storage devices comprise network attached storage (NAS), object storage, local storage, or cloud networks, the method further comprising; generating, by each provider of a storage device of the plurality of storage devices, a dataset snapshot as a read-only dataset component stored in memory local to the provider, wherein the dataset snapshot comprises a list of snapshot copies provided by each provider; and copying the dataset to a remote storage location using a dataset backup, wherein the remote storage location is different from the local storage location. 3 . The method of claim 2 wherein the lifecycle of the data objects in the VCS comprises checking out data objects of a project to be modified, modifying the data objects to generate a revised version of the project from a first version, and committing the data objects of the revised version to a repository as a VCS datastore. 4 . The method of claim 3 further comprising storing, in the VCS datastore, a commit-map and commit-tree of the next version of the project, wherein the commit map stores commit records for the data objects from the first version to the revised version, and wherein the commit-tree stores a timeline of the commit operations generating the commit records. 5 . The method of claim 4 further comprising: assigning a snapshot-ID to each dataset snapshot for tracking a corresponding snapshot through the commit map and commit-tree; and performing one or more VCS operations on an identified dataset snapshot including at least one of a commit, checkout, merge, branch, or merge-branch operation. 6 . The method of claim 5 further comprising defining a HEAD index that points to a commit operation that the dataset snapshot is based on, and wherein the HEAD index is null at a beginning of a commit-tree for the delete snapshot. 7 . The method of claim 6 wherein, for the commit operation, the method further comprises: creating the dataset snapshot for storage on either the remote or local storage; creating a commit record; and adding a commit identifier in the commit-tree after a position of the HEAD index; and setting the HEAD index to be the commit identifier. 8 . The method of claim 7 wherein, for the checkout operation, the method further comprises: retrieving snapshot-ID from the commit record; copying content of the dataset snapshot to the original dataset; and setting the HEAD index to be a checkout commit identifier. 9 . The method of claim 8 wherein, for the merge operation, the method further comprises: retrieving the snapshot-ID from the commit record merging the content of the dataset snapshot with the original dataset; and performing the commit operation. 10 . The method of claim 9 wherein, for the branch operation, the method further comprises: creating a new dataset snapshot from the original dataset; and creating a new checkout commit-ID to be stored in a new datastore. 11 . The method of claim 10 wherein, for the merge-branch operation, the method further comprises: merging the original dataset into a target datastore; and committing the merge in the target datastore. 12 . The method of claim 11 wherein the VCS manages changes to software programs, documents, web sites, and other content data embodying the data objects, and wherein the first version and revised version are each denoted by successive alphanumeric version character, and wherein each identifier of the snapshot-ID and commit-ID reference the version character. 13 . The method of claim 3 wherein the data objects within each version of the project are encompassed by a respective dataset and are subject to same control rules in each stage of a lifecycle of the project as grouped data, wherein the control rules provide access only to authorized users or perform only authorized operations including data storage operations on the dataset referenced data objects based on a current stage of the lifecycle, and wherein the dataset is processed in the system as a single unit based on data content rather than data location. 14 . The method of claim 13 wherein the dataset is produced by: gathering the identified metadata for storage in a data catalog; and executing a user entered query comprising metadata selectors as dataset tags for matching against the cataloged metadata to generate the dataset, wherein the metadata selectors comprise tags consisting of alphanumeric strings applied to respective data objects based on user-defined rules, and wherein the tags define at least one of a file type, name, location, creation time, or characteristic. 15 . A computer-implemented method of managing different versions of data objects for a version control system (VCS) during a lifecycle of the data objects, comprising: identifying data objects that evolve through the different versions during the lifecycle; producing a dataset for the data objects data as a group by scanning the data objects to identify metadata of the grouped data to be re-versioned together throughout the lifecycle, and storing the identified metadata in the dataset; generating dataset snapshots as read-only dataset components for the dataset as it progresses along the lifecycle; copying the dataset to a remote storage location using a dataset backup; assigning a snapshot-ID to each dataset snapshot for tracking a corresponding snapshot through the commit map and commit-tree; and performing one or more VCS operations on an identified dataset snapshot including at least one of a commit, checkout, merge, branch, or merge-branch operation. 16 . The method of claim 15 further comprising storing, in the VCS datastore, a commit-map and commit-tree of the next version of the project, wherein the commit map stores commit records for the data objects from the first version to the revised version, and wherein the commit-tree stores a timeline of the commit operations generating the commit records. 17 . The method of claim 16 wherein the dataset is distributed across the plurality of storage devices comprise network attached storage (NAS), object storage, local storage, or cloud networks, the method further comprising generating by each provider of a storage device of the plurality of storage devices, a dataset snapshot as a read-only dataset component stored in memory local to the provider, wherein the dataset snapshot comprises a list of snapshot copies provided by each provider. 18 . The method of claim 17 wherein the VCS manages changes to software programs, documents, web sites, and other content data embodying the data objects, and wherein the first version and revised version are each denoted by successive alphanumer
to a system of files or objects, e.g. local or distributed file system or database · CPC title
characterised by the use of retention policies (retention policies for HSM systems G06F16/185) · CPC title
Versioning file systems, temporal file systems, e.g. file system supporting different historic versions of files · CPC title
Access rights, e.g. capability lists, access control lists, access tables, access matrices · CPC title
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.