Distributed architecture for content indexing emails

US10846266B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10846266-B2
Application numberUS-201816130873-A
CountryUS
Kind codeB2
Filing dateSep 13, 2018
Priority dateSep 14, 2017
Publication dateNov 24, 2020
Grant dateNov 24, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An improved content indexing (CI) system is disclosed herein. For example, the improved CI system may include a distributed architecture of client computing devices, media agents, a single backup and CI database, and a pool of servers. After a file backup occurs, the backup and CI database may include file metadata indices and other information associated with backed up files. Servers in the pool of servers may, in parallel, query the backup and CI database for a list of files assigned to the respective server that have not been content indexed. The servers may then request a media agent to restore the assigned files from secondary storage and provide the restored files to the servers. The servers may then content index the received restored files. Once the content indexing is complete, the servers can send the content index information to the backup and CI database for storage.

First claim

Opening claim text (preview).

What is claimed is: 1. A networked information management system for content indexing emails, the networked information management system comprising: a content indexing proxy having one or more first hardware processors, wherein the content indexing proxy is configured with first computer-executable instructions that, when executed, cause the content indexing proxy to: receive, by a first thread executing on the content indexing proxy, identification of emails assigned to the content indexing proxy by a master content indexing proxy, wherein the identified emails are each associated with an email page in a plurality of email pages, and wherein an email page in the plurality of email pages comprises multiple emails; and for each email page in the plurality of email pages, transmit, by the first thread to an indexing storage system, a query for secondary copy location data corresponding to the emails associated with the respective email page, receive, by the first thread, the secondary copy location data, transmit, by a second thread executing on the content indexing proxy, an instruction to a first computing device that executes a media agent to restore secondary copies stored at locations indicated by the secondary copy location data, receive, by a third thread executing on the content indexing proxy, an acknowledgment from the first computing device that a restoration of the secondary copies is complete, and transmit, by a fourth thread executing on the content indexing proxy, a request to content index the restored secondary copies; and one or more computing devices in communication with the content indexing proxy, wherein the one or more computing devices each have one or more second hardware processors, wherein the one or more computing devices are configured with second computer-executable instructions that, when executed, cause the one or more computing devices to content index the restored secondary copies. 2. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the content indexing proxy to simultaneously transmit an instruction to the first computing device to restore secondary copies of emails associated with a first email page in the plurality of email pages and transmit a query for secondary copy location data corresponding to emails associated with a second email page in the plurality of email pages. 3. The networked information management system of claim 1 , wherein the first computer-executable instructions, when executed, further cause the content indexing proxy to: for an attachment file associated with a first email in a first email page in the plurality of email pages, transmit, by the first thread to the indexing storage system, a query for secondary copy location data corresponding to the attachment file; receive, by the first thread, the secondary copy location data corresponding to the attachment file; transmit, by the second thread, an instruction to the first computing device to restore a secondary copy of the attachment file stored at a location indicated by the secondary copy location data corresponding to the attachment file; receive, by the third thread, an acknowledgment from the first computing device that a restoration of the secondary copy of the attachment file is complete; and transmit, by the fourth thread, a request to content index the restored secondary copy of the attachment file. 4. The networked information management system of claim 3 , wherein the secondary copy of the attachment file is stored separately from a secondary copy of the first email in a secondary storage device. 5. The networked information management system of claim 1 , wherein the secondary copy location data comprises at least one of logical paths to secondary copies stored in a secondary storage device or offsets indicating where the secondary copies are stored in the secondary storage device. 6. The networked information management system of claim 1 , wherein the emails assigned to the content indexing proxy are emails that have not yet been content indexed. 7. A networked information management system for content indexing emails, the networked information management system comprising: a content indexing proxy having one or more first hardware processors, wherein the content indexing proxy is configured with first computer-executable instructions that, when executed, cause the content indexing proxy to: receive, by a first thread executing on the content indexing proxy, identification of emails assigned to the content indexing proxy by a master content indexing proxy, wherein the identified emails are each associated with an email page in a plurality of email pages; and for each email page in the plurality of email pages, transmit, by the first thread to an indexing storage system, a query for secondary copy location data corresponding to the emails associated with the respective email page, receive, by the first thread, the secondary copy location data, transmit, by a second thread executing on the content indexing proxy, an instruction to a first computing device that executes a media agent to restore secondary copies stored at locations indicated by the secondary copy location data, receive, by a third thread executing on the content indexing proxy, an acknowledgment from the first computing device that a restoration of the secondary copies is complete, and transmit, by a fourth thread executing on the content indexing proxy, a request to content index the restored secondary copies; and one or more computing devices in communication with the content indexing proxy, wherein the one or more computing devices each have one or more second hardware processors, wherein the one or more computing devices are configured with second computer-executable instructions that, when executed: cause the one or more computing devices to content index the restored secondary copies; and extract one or more keywords and generate one or more previews using the restored secondary copies. 8. The networked information management system of claim 7 , wherein the second computer-executable instructions, when executed, further cause the one or more computing devices to store the one or more keywords and the one or more previews in different databases. 9. The networked information management system of claim 7 , wherein the second computer-executable instructions, when executed, further cause the one or more computing devices to store the one or more keywords and a path to a storage location of the one or more previews in a backup and content indexing database. 10. The networked information management system of claim 1 , wherein the restored secondary copies are in a markup language format. 11. A computer-implemented method for content indexing emails, the computer-implemented method comprising: receiving, by a first thread executing on a content indexing proxy, identification of emails assigned to the content indexing proxy by a master content indexing proxy, wherein the identified emails are each associated with an email page in a plurality of email pages, and wherein an email page in the plurality of email pages comprises multiple emails; and for each email page in the plurality of email pages, transmitting, by the first thread to an indexing storage system, a query for secondary copy location data corresponding to the emails associated with the respective email page, receiving, by the first thread, the secondary copy location data, transmitting, by a second thread executing on the content indexing proxy, an instruction to a first computing device that executes a media agent to restore secondary

Assignees

Inventors

Classifications

  • G06Q10/107Primary

    Computer-aided management of electronic mailing [e-mailing] · CPC title

  • Handling conversation history, e.g. grouping of messages in sessions or threads · CPC title

  • Storing data temporarily at an intermediate stage, e.g. caching · CPC title

  • using de-duplication of the data · CPC title

  • Backup restoration techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10846266B2 cover?
An improved content indexing (CI) system is disclosed herein. For example, the improved CI system may include a distributed architecture of client computing devices, media agents, a single backup and CI database, and a pool of servers. After a file backup occurs, the backup and CI database may include file metadata indices and other information associated with backed up files. Servers in the po…
Who is the assignee on this patent?
Commvault Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06Q10/107. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 24 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).