Distributed framework for task splitting and task assignments in a content indexing system

US10846180B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10846180-B2
Application numberUS-201816130849-A
CountryUS
Kind codeB2
Filing dateSep 13, 2018
Priority dateSep 14, 2017
Publication dateNov 24, 2020
Grant dateNov 24, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An improved content indexing (CI) system is disclosed herein. For example, the improved CI system may include a distributed architecture of client computing devices, media agents, a single backup and CI database, and a pool of servers. After a file backup occurs, the backup and CI database may include file metadata indices and other information associated with backed up files. Servers in the pool of servers may, in parallel, query the backup and CI database for a list of files assigned to the respective server that have not been content indexed. The servers may then request a media agent to restore the assigned files from secondary storage and provide the restored files to the servers. The servers may then content index the received restored files. Once the content indexing is complete, the servers can send the content index information to the backup and CI database for storage.

First claim

Opening claim text (preview).

What is claimed is: 1. A networked information management system for content indexing data, the networked information management system comprising: a master content indexing proxy having one or more first hardware processors, wherein the master content indexing proxy is configured with first computer-executable instructions that, when executed, cause the master content indexing proxy to: transmit a query for a total amount of data to content index; receive an indication of the total amount of data to content index; determine a total number of controller content indexing proxies that are available to perform content indexing operations; for each available controller content indexing proxy, determine a total number of worker threads executing on the respective available controller content indexing proxy that are available to perform content indexing operations, assign a portion of the total amount of data to content index to the respective available controller content indexing proxy based on at least one of the total amount of data to content index, the total number of available controller content indexing proxies, or the total number of available worker threads executing on the respective available controller content indexing proxy, and transmit an instruction to the respective available controller content indexing proxy indicating the portion of the total amount of data to content index assigned to the respective available controller content indexing proxy; and track progress of content indexing performed by a first available controller content indexing proxy; and an indexing storage system in communication with the master content indexing proxy, wherein the indexing storage system has one or more second hardware processors, wherein the indexing storage system is configured with second computer-executable instructions that, when executed, cause the indexing storage system to transmit the indication of the total amount of data to content index to the master content indexing proxy. 2. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to transmit a notification indicating the tracked progress. 3. The networked information management system of claim 1 , wherein the tracked progress comprises one of a percentage of data assigned to the first available controller content indexing proxy that has yet to be content indexed, an amount of data assigned to the first available controller content indexing proxy that has yet to be content indexed, or a time remaining until the data assigned to the first available controller content indexing proxy is content indexed. 4. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to: determine that the first available controller content indexing proxy is operating at a performance level below a threshold value based on the tracked progress; and assign at least some of the content indexing tasks assigned to the first available controller content indexing proxy to another available controller content indexing proxy. 5. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to assign one of a first archive file, a portion of a second archive file, or individual primary data to a first available controller content indexing proxy. 6. The networked information management system of claim 1 , wherein a first worker thread and a second worker thread execute on a first available controller content indexing proxy. 7. The networked information management system of claim 6 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to: assign a first archive file to the first worker thread; and assign a second archive file to the second worker thread. 8. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the master content indexing proxy to determine a total amount of data to content index for a second set of content indexing operations while the total number of controller content indexing proxies that are available to perform the content indexing operations is determined. 9. The networked information management system of claim 1 , wherein the total amount of data to content index comprises at least one of a total number of archive files that include secondary copies that correspond with primary data to be context indexed or a number of secondary copies that are associated with each archive file that correspond with primary data to be context indexed. 10. A computer-implemented method for content indexing data, the computer-implemented method comprising: transmitting a query for a total amount of data to content index; receiving an indication of the total amount of data to content index; determining a total number of controller content indexing proxies that are available to perform content indexing operations; for each available controller content indexing proxy, determining a total number of worker threads executing on the respective available controller content indexing proxy that are available to perform content indexing operations, assigning a portion of the total amount of data to content index to the respective available controller content indexing proxy based on at least one of the total amount of data to content index, the total number of available controller content indexing proxies, or the total number of available worker threads executing on the respective available controller content indexing proxy, and transmitting an instruction to the respective available controller content indexing proxy indicating the portion of the total amount of data to content index assigned to the respective available controller content indexing proxy; and tracking progress of content indexing performed by a first available controller content indexing proxy. 11. The computer-implemented method of claim 10 , further comprising transmitting a notification indicating the tracked progress. 12. The computer-implemented method of claim 10 , wherein the tracked progress comprises one of a percentage of data assigned to the first available controller content indexing proxy that has yet to be content indexed, an amount of data assigned to the first available controller content indexing proxy that has yet to be content indexed, or a time remaining until the data assigned to the first available controller content indexing proxy is content indexed. 13. The computer-implemented method of claim 10 , further comprising: determining that the first available controller content indexing proxy is operating at a performance level below a threshold value based on the tracked progress; and assigning at least some of the content indexing tasks assigned to the first available controller content indexing proxy to another available controller content indexing proxy. 14. The computer-implemented method of claim 10 , wherein assigning a portion of the total amount of data to content index to the respective available controller content indexing proxy further comprises assigning one of a first archive file, a portion of a second archive file, or individual primary data to a first available controller content indexing proxy. 15. The computer-implemented method of claim 10 , wherein a first worker thread and a second worker thread

Assignees

Inventors

Classifications

  • Indexing structures · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • where the redundant components share neither address space nor persistent storage · CPC title

  • maintaining the standby controller/processing unit updated (initialisation or re-synchronisation thereof G06F11/1658 and subgroups) · CPC title

  • eliminating a faulty processor or activating a spare · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10846180B2 cover?
An improved content indexing (CI) system is disclosed herein. For example, the improved CI system may include a distributed architecture of client computing devices, media agents, a single backup and CI database, and a pool of servers. After a file backup occurs, the backup and CI database may include file metadata indices and other information associated with backed up files. Servers in the po…
Who is the assignee on this patent?
Commvault Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2228. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 24 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).