Distributed content indexing architecture with separately stored file previews

US11036592B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11036592-B2
Application numberUS-201816130874-A
CountryUS
Kind codeB2
Filing dateSep 13, 2018
Priority dateSep 14, 2017
Publication dateJun 15, 2021
Grant dateJun 15, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An improved content indexing (CI) system is disclosed herein. For example, the improved CI system may include a distributed architecture of client computing devices, media agents, a single backup and CI database, and a pool of servers. After a file backup occurs, the backup and CI database may include file metadata indices and other information associated with backed up files. Servers in the pool of servers may, in parallel, query the backup and CI database for a list of files assigned to the respective server that have not been content indexed. The servers may then request a media agent to restore the assigned files from secondary storage and provide the restored files to the servers. The servers may then content index the received restored files. Once the content indexing is complete, the servers can send the content index information to the backup and CI database for storage.

First claim

Opening claim text (preview).

What is claimed is: 1. A networked information management system for separately storing previews, the networked information management system comprising: a preview database; a backup and content indexing database different than the preview database; and a content indexing service having one or more first hardware processors, wherein the content indexing service is configured with first computer-executable instructions that, when executed, cause the content indexing service to: receive a restored version of a secondary copy, wherein the secondary copy corresponds to a first data file; parse the restored version of the secondary copy; extract one or more keywords corresponding the first data file based on the parsing of the restored version of the secondary copy; generate a preview of the restored version of the secondary copy; store the generated preview of the restored version of the secondary copy in the preview database; and store, in the backup and content indexing database, the one or more extracted keywords and a path to a storage location of the generated preview in the preview database. 2. The networked information management system of claim 1 , wherein the preview database comprises a link to a duplicate preview at a location corresponding to the path to the storage location of the generated preview. 3. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the content indexing service to identify the path to the storage location of the generated preview in the preview database subsequent to storing the generated preview in the preview database. 4. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the content indexing service to process an instruction to content index the first data file. 5. The networked information management system of claim 4 , wherein the first computer-executable instructions, when executed, further cause the content indexing service to parse the restored version of the secondary copy in response to reception of the instruction to content index the first data file. 6. The networked information management system of claim 4 , wherein the first computer-executable instructions, when executed, further cause the content indexing service to process an instruction to content index the first data file received from a controller content indexing proxy. 7. The networked information management system of claim 6 , wherein the first computer-executable instructions, when executed, further cause the content indexing service to receive the restored version of the secondary copy as a result of the controller content indexing proxy instructing a first computing device having a media agent to restore the first data file. 8. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the content indexing service to store the one or more extracted keywords in the backup and content indexing database in an entry associated with the first data file. 9. The networked information management system of claim 1 , wherein storage of the one or more extracted keywords in the backup and content indexing database results in an indication, in the backup and content indexing database, that the first data file is content indexed. 10. The networked information management system of claim 1 , wherein the restored version of the secondary copy is in a markup language format. 11. A computer-implemented method for separately storing previews, the networked information management system comprising: receiving a restored version of a secondary copy, wherein the secondary copy corresponds to a first data file; parsing the restored version of the secondary copy; extracting one or more keywords corresponding the first data file based on the parsing of the restored version of the secondary copy; generating a preview of the restored version of the secondary copy; storing the generated preview of the restored version of the secondary copy in a preview database; and storing, in a backup and content indexing database different than the preview database, the one or more extracted keywords and a path to a storage location of the generated preview in the preview database. 12. The computer-implemented method of claim 11 , wherein the preview database comprises a link to a duplicate preview at a location corresponding to the path to the storage location of the generated preview. 13. The computer-implemented method of claim 11 , further comprising identifying the path to the storage location of the generated preview in the preview database subsequent to storing the generated preview in the preview database. 14. The computer-implemented method of claim 11 , further comprising receiving an instruction to content index the first data file. 15. The computer-implemented method of claim 14 , wherein parsing the restored version of the secondary copy further comprises parsing the restored version of the secondary copy in response to reception of the instruction to content index the first data file. 16. The computer-implemented method of claim 14 , wherein receiving an instruction to content index the first data file further comprises: receiving an instruction to content index the first data file from a first controller content indexing proxy at the direction of a master content indexing proxy; and receiving an instruction to content index a second data file from a second controller content indexing proxy at the direction of the master content indexing proxy. 17. The computer-implemented method of claim 16 , wherein receiving the restored version of the secondary copy further comprises receiving the restored version of the secondary copy as a result of the first controller content indexing proxy instructing a first computing device having a media agent to restore the first data file. 18. The computer-implemented method of claim 11 , wherein storing the one or more extracted keywords further comprises storing the one or more extracted keywords in the backup and content indexing database in an entry associated with the first data file. 19. The computer-implemented method of claim 11 , wherein storage of the one or more extracted keywords in the backup and content indexing database results in an indication, in the backup and content indexing database, that the first data file is content indexed. 20. The computer-implemented method of claim 11 , wherein the restored version of the secondary copy is in an independent format.

Assignees

Inventors

Classifications

  • G06F16/182Primary

    Distributed file systems · CPC title

  • for networked environments · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title

  • Details of searching files based on file metadata · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11036592B2 cover?
An improved content indexing (CI) system is disclosed herein. For example, the improved CI system may include a distributed architecture of client computing devices, media agents, a single backup and CI database, and a pool of servers. After a file backup occurs, the backup and CI database may include file metadata indices and other information associated with backed up files. Servers in the po…
Who is the assignee on this patent?
Commvault Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/182. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).