Data catalog for dataset lifecycle management system for content-based data protection

US12321240B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12321240-B2
Application numberUS-202217974722-A
CountryUS
Kind codeB2
Filing dateOct 27, 2022
Priority dateOct 27, 2022
Publication dateJun 3, 2025
Grant dateJun 3, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Providing content based data protection for data stored in a large-scale data storage system by scanning data stored in one or more databases for discovery of metadata, and extracting the discovered metadata, for storage in a data catalog, the data catalog having a scanning function performing the scanning step, and comprising a database storing the metadata in one or more tables. A protection policy is defined to commonly protect content data referenced by metadata in the data catalog, and applied to the referenced content data to perform a data protection operation the content data. Datasets stored in the catalog are generated by running queries on the catalog, where a query comprises metadata selectors as tags applied to the catalog, where the tags define at least one of a file type, name, location, creation time, or file characteristic.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of providing content based data protection, comprising: scanning data stored in one or more databases for discovery of metadata; extracting the discovered metadata; storing the metadata in a data catalog, the data catalog having a scanning function performing the scanning step, and comprising a database storing the metadata in one or more tables; defining protection policies to commonly protect content data referenced by metadata in the data catalog, wherein the content data comprises data objects having disparate file format and protected by different protection policies; iteratively processing the dataset to tag the data objects according to a native file format; attaching multiple tags to the dataset to indicate that the data objects of the dataset are of different file types according to the disparate file formats; merging the protection policies to protect the dataset under a merged protection policy utilizing a most restrictive policy of the different protection policies; producing, from the data catalog, a change file list storing names of files changed from a first scan period to a next scan period for use by the protection policy; and applying the merged protection policy to the referenced content data to perform a data protection operation on the content data to provide content-based data protection rather than location-based data protection. 2. The method of claim 1 further comprising compiling the metadata into a single dataset prior to storing in the data catalog, wherein the dataset automatically tracks data added, removed or relocated to content data protected by the defined protection policy. 3. The method of claim 2 wherein the dataset is organized into collection information and per file and object information, and further wherein collection information comprises a dataset creation time, a query, role-based access control (RBAC) for the dataset, and first free-form metadata, and wherein the per file and object information comprises location of data of the dataset, unstructured metadata information, and second free-form metadata. 4. The method of claim 3 wherein the dataset is one of a static dataset or a dynamic dataset, wherein the static dataset comprises a fixed amount of data set at a time of creation, and the dynamic dataset comprises an amount of data that changes over time. 5. The method of claim 4 further comprising interfacing both the static dataset and dynamic dataset to the content data through a catalog interface to form a static database catalog and a dynamic database catalog. 6. The method of claim 5 wherein the static database catalog is used to create and store persistent datasets that contain data that is not modifiable during its lifecycle. 7. The method of claim 6 wherein the catalog comprises a user interface displaying to the user data usage trends, storage device usage, or storage device health, and further providing a mechanism through which a user can perform searches for files of the content data. 8. The method of claim 2 wherein the dataset comprises a logical collection of metadata for unstructured files and objects that are grouped together by one or more filters from a data query performed on the data catalog. 9. The method of claim 8 wherein the dataset represents a subset of data that a user categorizes for specific needs, wherein actions performed on the dataset will affect only the corresponding content data referenced by the metadata. 10. The method of claim 9 wherein the dataset spans multiple storage device types and multiple operating environments including edge networks, core networks and public or cloud networks. 11. The method of claim 1 wherein the data protection operation comprises at least one of: backing up data from operating memory to storage memory, restoring data from the storage to the operating memory, moving data among storage devices, and tiering data between different storage devices, and wherein the dataset automatically tracks data added, removed or relocated to content data protected by the defined protection policy. 12. A computer-implemented method of providing content-based data protection for data stored in a large-scale data storage system, comprising: accessing content data stored in the data storage system; deploying a data catalog that comprising a scanning function configured to discover metadata associated with the content data, and a database storing the discovered metadata; defining protection policies to commonly protect selected data referenced by metadata in the data catalog; iteratively processing the dataset to tag the data objects according to a native file format; attaching multiple tags to the dataset to indicate that the data objects of the dataset are of different file types according to the disparate file formats; merging the protection policies to protect the dataset under a merged protection policy utilizing a most restrictive policy of the different protection policies; running a query received from a user against the catalog to generate the dataset based on the multiple tags; producing, from the data catalog, a change file list storing names of files changed from a first scan period to a next scan period for use by the protection policy; and applying the merged protection policy to the dataset to perform a data protection application on the selected data so as to provide content-based data protection rather than location-based data protection. 13. The method of claim 12 further comprising: creating the dataset by grouping metadata for unstructured data objects that are grouped together by one or more filters, wherein the dataset spans multiple storage devices of different storage types; initiating the query that generates one or more filters; and defining the protection policy to protect the dataset as the single unit based on data content rather than data location, wherein the query comprises metadata selectors applied to the catalog. 14. The method of claim 13 wherein the metadata selectors comprise tags consisting of alphanumeric strings applied to respective data objects based on user-defined rules, and wherein the tags define at least one of a file type, name, location, creation time, or characteristic. 15. The method of claim 12 wherein the dataset is one of a static dataset or a dynamic dataset, wherein the static dataset comprises a fixed amount of data set at a time of creation, and the dynamic dataset comprises an amount of data that changes over time, and wherein the dataset is organized into collection information and per file and object information. 16. The method of claim 15 wherein collection information comprises a dataset creation time, the query, role-based access control (RBAC) for the dataset, and first free-form metadata, and wherein the per file and object information comprises location of data of the dataset, unstructured metadata information, and second free-form metadata. 17. The method of claim 12 wherein the defined protection policy comprises at least one of: backing up data from operating memory to storage memory, restoring data from the storage to the operating memory, moving data among memory, and tiering data between different storage memory. 18. The method of claim 12 wherein the dataset spans multiple storage device types and multiple operating environments including edge networks, core networks and public or cloud networks. 19. A hardware-embodied computer program product having stored thereon program code that when ex

Assignees

Inventors

Classifications

  • Backup restoration techniques · CPC title

  • Database-specific techniques · CPC title

  • Backup scheduling policy · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12321240B2 cover?
Providing content based data protection for data stored in a large-scale data storage system by scanning data stored in one or more databases for discovery of metadata, and extracting the discovered metadata, for storage in a data catalog, the data catalog having a scanning function performing the scanning step, and comprising a database storing the metadata in one or more tables. A protection …
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F11/1469. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 03 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).