Systems and methods for efficiently extracting contents of container files

US9922033B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9922033-B1
Application numberUS-201514754734-A
CountryUS
Kind codeB1
Filing dateJun 30, 2015
Priority dateJun 30, 2015
Publication dateMar 20, 2018
Grant dateMar 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed computer-implemented method for efficiently extracting contents of container files may include (1) receiving a container file that includes (a) an additional container file that includes (i) a constituent file and (ii) metadata of the constituent file and (b) metadata of the additional container file, (2) creating, before the constituent file is extracted from the additional container file, a content hierarchy for the container file that includes (a) the metadata of the constituent file, (b) hierarchical metadata that indicates that the container file includes the additional container file, and (c) additional hierarchical metadata that indicates that the additional container file includes the constituent file, (3) querying, after the content hierarchy is created, the content hierarchy to locate the constituent file within the additional container file, (4) extracting the constituent file, and (5) performing an action on the constituent file. Various other methods, systems, and computer-readable media are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for efficiently extracting contents of container files, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising: receiving, at a first stage of a file-archiving system, an unnested container file containing a first constituent file and a second constituent file, wherein: the first constituent file is a nested container file that contains the second constituent file; and the file-archiving system is configured to perform a time-consuming file-indexing operation; enabling high file throughput at the first stage of the file-archiving system by refraining, at the first stage of the file-archiving system, from performing the time-consuming file-indexing operation; enabling a second stage of the file-archiving system to perform the time-consuming file-indexing operation on the second constituent file by creating, at the first stage of the file-archiving system, a content hierarchy for the unnested container file that comprises: metadata of the second constituent file; first hierarchical metadata that indicates that the unnested container file contains the nested container file; and second hierarchical metadata that indicates that the nested container file contains the second constituent file; using, at the second stage of the file-archiving system, the content hierarchy to locate the second constituent file within the nested container file; extracting, at the second stage of the file-archiving system, the second constituent file from the nested container file; performing, at the second stage of the file-archiving system, the time-consuming file-indexing operation on the second constituent file. 2. The computer-implemented method of claim 1 , wherein: the file-archiving system comprises: at least one computing node that comprises hardware resources that are optimized to parse container-file metadata; at least one additional computing node that comprises additional hardware resources that are optimized to perform the time-consuming file-indexing operation; the first stage of the file-archiving system is performed by the at least one computing node; the second stage of the file-archiving system is performed by the at least one additional computing node. 3. The computer-implemented method of claim 1 , wherein the file-indexing operation comprises a time-consuming content-conversion operation. 4. The computer-implemented method of claim 1 , wherein: the file-archiving system comprises an email archiving system; the unnested container file comprises an email; the nested container file comprises an attachment of the email. 5. The computer-implemented method of claim 1 , wherein: the second hierarchical metadata indicates that the second constituent file is at a hierarchical level within the unnested container file; using the content hierarchy to locate the second constituent file within the nested container file comprises using the content hierarchy to locate one or more files at the hierarchical level within the unnested container file. 6. The computer-implemented method of claim 1 , wherein: the metadata of the second constituent file comprises a file type of the second constituent file; using the content hierarchy to locate the second constituent file within the nested container file comprises using the content hierarchy to locate one or more files of the file type of the second constituent file. 7. The computer-implemented method of claim 1 , wherein: the metadata of the second constituent file comprises a size of the second constituent file; using the content hierarchy to locate the second constituent file within the nested container file comprises using the content hierarchy to locate one or more files that are of the size. 8. The computer-implemented method of claim 1 , wherein performing the time-consuming file-indexing operation on the second constituent file comprises converting the second constituent file to a text-based representation of the second constituent file. 9. The computer-implemented method of claim 8 , wherein performing the time-consuming file-indexing operation on the second constituent file comprises using the text-based representation of the second constituent file to index the second constituent file. 10. The computer-implemented method of claim 1 , further comprising: after the content hierarchy has been created, receiving a request for the metadata of the second constituent file; locating the metadata of the second constituent file stored within the content hierarchy; responding to the request with the metadata of the second constituent file stored within the content hierarchy. 11. The computer-implemented method of claim 1 , wherein: the unnested container file comprises a third constituent file; the step of extracting the second constituent file from the nested container file is performed without extracting the third constituent file from the unnested container file. 12. The computer-implemented method of claim 1 , wherein: the content hierarchy comprises hierarchical metadata for each file contained within the unnested container file; the content hierarchy is completely created at the first stage of the file-archiving system before any file is extracted at the second stage of the file-archiving system. 13. The computer-implemented method of claim 1 , wherein no files are extracted from the unnested container file at the first stage of the file-archiving system. 14. The computer-implemented method of claim 1 , wherein: the unnested container file comprises an additional nested container file; the additional nested container file contains the nested container file; the first hierarchical metadata indicates that the unnested container file contains the additional nested container file; the content hierarchy comprises third hierarchical metadata that indicates that the additional nested container file contains the nested container file. 15. The computer-implemented method of claim 1 , wherein: the unnested container file contains metadata of the nested container file that is separate and distinct from the nested container file; the nested container file contains the metadata of the second constituent file that is separate and distinct from the second constituent file. 16. A system for efficiently extracting contents of container files, the system comprising: a file-receiving module, stored in memory, that receives, at a first stage of a file-archiving system, an unnested container file containing a first constituent file and a second constituent file, wherein: the first constituent file is a nested container file that contains the second constituent file; and the file-archiving system is configured to perform a time-consuming file-indexing operation; a creating module, stored in memory, that: enables high file throughput at the first stage of the file-archiving system by refraining, at the first stage of the file-archiving system, from performing the time-consuming file-indexing operation; and enables a second stage of the file-archiving system to perform the time-consuming file-indexing operation on the second constituent file by creating, at the first stage of the file-archiving system and before the second constituent file is extracted from the nested container file, a content hierarchy for the unnested container file that comprises: metadata of the second constituent file; first hierarchical metadata that indicates that the unnested container file comprises the nested container file

Assignees

Inventors

Classifications

  • G06F16/113Primary

    Details of archiving (lifecycle management in storage systems G06F3/0649; point-in-time backing up or restoration of persistent data G06F11/1446) · CPC title

  • Hierarchical databases, e.g. IMS, LDAP data stores or Lotus Notes · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9922033B1 cover?
The disclosed computer-implemented method for efficiently extracting contents of container files may include (1) receiving a container file that includes (a) an additional container file that includes (i) a constituent file and (ii) metadata of the constituent file and (b) metadata of the additional container file, (2) creating, before the constituent file is extracted from the additional conta…
Who is the assignee on this patent?
Veritas Technologies Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/113. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).