Data query processing system for content-based data protection and dataset lifecycle management

US12153638B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12153638-B2
Application numberUS-202217975035-A
CountryUS
Kind codeB2
Filing dateOct 27, 2022
Priority dateOct 27, 2022
Publication dateNov 26, 2024
Grant dateNov 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Providing content based data protection for data stored in a large-scale data storage system by creating a dataset by grouping metadata for unstructured data objects that are grouped together by one or more filters. The dataset can span multiple storage devices of different types, so that it defines a single data protection unit for the corresponding content data. A user initiated query input through a search engine interface generates the one or more filters, and a protection policy is defined that protects the dataset as the single unit based on data content rather than data location. Datasets are stored in a catalog, and are generated by running queries on the catalog, where a query comprises metadata selectors as tags applied to the catalog, where the tags define at least one of a file type, name, location, creation time, or file characteristic.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of providing content-based data protection for data stored in a large-scale data storage system, comprising: receiving a query through an interface to a computerized search engine accessing data in the data storage system; defining a protection policy to protect selected data stored in different storage devices and network environments through a deduplication backup program executed by a backup server in the data storage system, the selected data having corresponding metadata, wherein the metadata comprises information describing the selected data through one or more characteristics to establish a unique data identifier for corresponding content data; generating a response to the query comprising a dataset produced upon processing the query, wherein the dataset automatically tracks data added, removed or relocated to content data protected by the defined protection policy; grouping, to create the dataset, metadata for unstructured data objects spanning multiple storage devices of different storage types by one or more filters, wherein the protection policy protects the selected data as a single unit based on data content rather than data location; and applying the defined protection policy to the dataset to perform a data protection operation by the backup server on the selected data. 2. The method of claim 1 wherein the dataset represents a subset of data that a user categorizes for specific purposes, wherein actions performed on the dataset will affect only the subset of data. 3. The method of claim 2 wherein the data protection operation comprises one of: backing up data from operating memory to storage memory, restoring data from the storage to the operating memory, moving data among storage devices, and tiering data between different storage devices. 4. The method of claim 1 further comprising tagging the selected data with a defined metadata tag. 5. The method of claim 4 further comprising: generating the one or more filters upon entry of the query. 6. The method of claim 5 wherein the query comprises metadata selectors applied to a catalog. 7. The method of claim 6 wherein the metadata selectors comprise tags consisting of alphanumeric strings applied to respective data objects based on user-defined rules, and wherein the tags define at least one of a file type, name, location, creation time, or characteristic. 8. The method of claim 7 wherein the dataset is one of a static dataset or a dynamic dataset, wherein the static dataset comprises a fixed amount of data set at a time of creation, and the dynamic dataset comprises an amount of data that changes over time, and wherein the dataset is organized into collection information and per file and object information. 9. The method of claim 8 wherein collection information comprises a dataset creation time, the query, role-based access control (RBAC) for the dataset, and first free-form metadata, and wherein the per file and object information comprises location of data of the dataset, unstructured metadata information, and second free-form metadata. 10. The method of claim 1 wherein the dataset spans multiple storage device types and multiple operating environments including edge networks, core networks and public or cloud networks. 11. A computer-implemented method of providing content-based data protection for data stored in a large-scale data storage system, comprising: defining a protection policy to protect selected data stored in different storage devices or network environments through a deduplication backup program executed by a backup server in the data storage system, the selected data having corresponding metadata, wherein the metadata comprises information describing the selected data through one or more characteristics to establish a unique data identifier for corresponding content data; storing the metadata in a catalog; executing, through an interface to a computerized search engine, a user entered query against the catalog to generate a dataset; generating a response to the query that comprises the dataset; grouping, to create the dataset, metadata for unstructured data objects spanning multiple storage devices of different storage types by one or more filters, wherein the protection policy protects the selected data as a single unit based on data content rather than data location; and applying the defined protection policy to the dataset to protect or otherwise operate on the selected data by the backup server. 12. The method of claim 11 further comprising tagging the selected data with a defined metadata tag. 13. The method of claim 12 further comprising: generating the one or more filters upon entry of the query. 14. The method of claim 13 wherein the query comprises metadata selectors applied to the catalog. 15. The method of claim 14 wherein the metadata selectors comprise tags consisting of alphanumeric strings applied to respective data objects based on user-defined rules, and wherein the tags define at least one of a file type, name, location, creation time, or characteristic. 16. The method of claim 15 wherein the dataset is one of a static dataset or a dynamic dataset, wherein the static dataset comprises a fixed amount of data set at a time of creation, and the dynamic dataset comprises an amount of data that changes over time, and wherein the dataset is organized into collection information and per file and object information. 17. The method of claim 16 wherein collection information comprises a dataset creation time, the query, role-based access control (RBAC) for the dataset, and first free-form metadata, and wherein the per file and object information comprises location of data of the dataset, unstructured metadata information, and second free-form metadata. 18. The method of claim 17 wherein the dataset spans multiple storage device types and multiple operating environments including edge networks, core networks and public or cloud networks. 19. The method of claim 11 wherein the defined protection policy comprises at least one of: backing up data from operating memory to storage memory, restoring data from the storage to the operating memory, moving data among memory, and tiering data between different storage memory. 20. A system for providing content-based data protection for data stored in a large-scale data storage system, comprising: a computerized search engine receiving a query through an interface to access data in the data storage system; a backup server in the data storage system executing a deduplication backup program using a defined a protection policy to protect selected data stored in different storage devices or network environments, the selected data having corresponding metadata, wherein the metadata comprises information describing the selected data through one or more characteristics to establish a unique data identifier for corresponding content data; a search engine component generating a response to the query comprising a dataset produced upon processing the query, wherein the dataset automatically tracks data added, removed or relocated to content data protected by the defined protection policy; a component creating the dataset by grouping metadata for unstructured data objects spanning multiple storage devices of different storage types by one or more filters, wherein the protection policy protects the selected data as a single unit based on data content rather than data location; and a backup server component applying the defin

Assignees

Inventors

Classifications

  • G06F16/953Primary

    Querying, e.g. by the use of web search engines · CPC title

  • to a system of files or objects, e.g. local or distributed file system or database · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12153638B2 cover?
Providing content based data protection for data stored in a large-scale data storage system by creating a dataset by grouping metadata for unstructured data objects that are grouped together by one or more filters. The dataset can span multiple storage devices of different types, so that it defines a single data protection unit for the corresponding content data. A user initiated query input t…
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F16/953. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).