Distributed framework for data proximity-based task splitting in a content indexing system

US11086834B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11086834-B2
Application numberUS-201816130823-A
CountryUS
Kind codeB2
Filing dateSep 13, 2018
Priority dateSep 14, 2017
Publication dateAug 10, 2021
Grant dateAug 10, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An improved content indexing (CI) system is disclosed herein. For example, the improved CI system may include a distributed architecture of client computing devices, media agents, a single backup and CI database, and a pool of servers. After a file backup occurs, the backup and CI database may include file metadata indices and other information associated with backed up files. Servers in the pool of servers may, in parallel, query the backup and CI database for a list of files assigned to the respective server that have not been content indexed. The servers may then request a media agent to restore the assigned files from secondary storage and provide the restored files to the servers. The servers may then content index the received restored files. Once the content indexing is complete, the servers can send the content index information to the backup and CI database for storage.

First claim

Opening claim text (preview).

What is claimed is: 1. A networked information management system for content indexing data, the networked information management system comprising: a master content indexing proxy having one or more first hardware processors, wherein the master content indexing proxy is configured with first computer-executable instructions that, when executed, cause the master content indexing proxy to: transmit a query for a total amount of data to content index; receive an indication of the total amount of data to content index; determine a total number of controller content indexing proxies that are available to perform content indexing operations; determine, based on the total number of controller content indexing proxies that are available to perform content indexing operations, that a first controller content indexing proxy is available to perform content indexing operations, wherein the first controller content indexing proxy is executed by a first computing device that executes a media agent, and wherein the media agent manages at least a subset of the total amount of data to content index; assign the subset of the total amount of data to content index to the first controller content indexing proxy such that the media agent restores secondary copies corresponding to the subset of the total amount of data and provides the restored secondary copies to the first controller content indexing proxy without transmitting the restored secondary copies over an external network; and transmit an instruction to the first controller content indexing proxy indicating that the subset of the total amount of data to content index is assigned to the first controller content indexing proxy; and an indexing storage system in communication with the master content indexing proxy, wherein the indexing storage system has one or more second hardware processors, wherein the indexing storage system is configured with second computer-executable instructions that, when executed, cause the indexing storage system to transmit the indication of the total amount of data to content index to the master content indexing proxy. 2. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to track progress of content indexing performed by the first controller content indexing proxy. 3. The networked information management system of claim 2 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to transmit a notification indicating the tracked progress. 4. The networked information management system of claim 2 , wherein the tracked progress comprises one of a percentage of the subset of the total amount of data assigned to the first controller content indexing proxy that has yet to be content indexed, an amount of the subset of the total amount of data assigned to the first controller content indexing proxy that has yet to be content indexed, or a time remaining until the subset of the total amount of data assigned to the first available controller content indexing proxy is content indexed. 5. The networked information management system of claim 2 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to: determine that the first controller content indexing proxy is operating at a performance level below a threshold value based on the tracked progress; and assign at least some of the subset of the total amount of data assigned to the first controller content indexing proxy to another controller content indexing proxy. 6. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to assign one of a first archive file, a portion of a second archive file, or individual primary data to the first controller content indexing proxy. 7. The networked information management system of claim 1 , wherein a first worker thread and a second worker thread execute on the first controller content indexing proxy. 8. The networked information management system of claim 7 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to: assign a first portion of the subset of the total amount of data to the first worker thread; and assign a second portion of the subset of the total amount of data to the second worker thread. 9. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to determine a total amount of data to content index for a second set of content indexing operations while the total number of controller content indexing proxies that are available to perform the content indexing operations is determined. 10. The networked information management system of claim 1 , wherein the subset of the total amount of data to content index comprises at least one of a total number of archive files that include secondary copies that correspond with primary data to be context indexed or a number of secondary copies that are associated with each archive file that correspond with primary data to be context indexed. 11. A computer-implemented method for content indexing data, the computer-implemented method comprising: transmitting a query for a total amount of data to content index; receiving an indication of the total amount of data to content index; determining a total number of controller content indexing proxies that are available to perform content indexing operations; determining, based on the total number of controller content indexing proxies that are available to perform content indexing operations, that a first controller content indexing proxy is available to perform content indexing operations, wherein the first controller content indexing proxy is executed by a first computing device that executes a media agent, and wherein the media agent manages at least a subset of the total amount of data to content index; assigning the subset of the total amount of data to content index to the first controller content indexing proxy such that the media agent restores secondary copies corresponding to the subset of the total amount of data and provides the restored secondary copies to the first controller content indexing proxy for use in content indexing without transmitting the restored secondary copies over an external network; and transmitting an instruction to the first controller content indexing proxy indicating that the subset of the total amount of data to content index is assigned to the first controller content indexing proxy. 12. The computer-implemented method of claim 11 , further comprising tracking progress of content indexing performed by the first controller content indexing proxy. 13. The computer-implemented method of claim 12 , further comprising transmitting a notification indicating the tracked progress. 14. The computer-implemented method of claim 12 , wherein the tracked progress comprises one of a percentage of the subset of the total amount of data assigned to the first controller content indexing proxy that has yet to be content indexed, an amount of the subset of the total amount of data assigned to the first controller content indexing proxy that has yet to be content indexed, or a time remaining until the subset of the total amount of data assigned to the first available controller content indexing proxy is content indexed. 15. The comp

Assignees

Inventors

Classifications

  • G06F16/27Primary

    Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Query processing · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • Indexing structures · CPC title

  • Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11086834B2 cover?
An improved content indexing (CI) system is disclosed herein. For example, the improved CI system may include a distributed architecture of client computing devices, media agents, a single backup and CI database, and a pool of servers. After a file backup occurs, the backup and CI database may include file metadata indices and other information associated with backed up files. Servers in the po…
Who is the assignee on this patent?
Commvault Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/27. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 10 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).