Systems and methods for facilitating analytics on data sets stored in remote monolithic files

US9864790B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9864790-B1
Application numberUS-201414580079-A
CountryUS
Kind codeB1
Filing dateDec 22, 2014
Priority dateDec 22, 2014
Publication dateJan 9, 2018
Grant dateJan 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed computer-implemented method for facilitating analytics on data sets stored in remote monolithic files may include (1) identifying, within a secondary storage system, a secondary copy of a data set duplicated from a primary copy of the data set stored in a primary storage system, (2) generating a set of virtual objects that represent at least a portion of the secondary copy of the data set, (3) exposing the set of virtual objects to a remote analytics engine via a network such that the set of individual data objects appears to be stored locally on the remote analytics engine, and then (4) enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects by way of the set of virtual objects via the network. Various other methods, systems, and computer-readable media are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for facilitating analytics on data sets stored in remote monolithic files, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising: identifying, within a secondary storage system, a monolithic file that: includes a secondary copy of a data set duplicated from a primary copy of the data set stored in a primary storage system; stores the secondary copy of the data set as a singular data block; identifying a set of individual data objects within the secondary copy of the data set included in the monolithic file; generating a set of virtual objects that represent the set of individual data objects identified within the secondary copy of the data set included in the monolithic file; exposing the set of virtual objects to a remote analytics engine via a network such that the individual data objects appear to the remote analytics engine to be stored locally on the remote analytics engine, wherein the remote analytics engine comprises a computer cluster that: includes a plurality of nodes; implements a distributed file system that manages data across the plurality of nodes; is unable to natively open or read monolithic files that store data sets as singular data blocks; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects identified within the secondary copy of the data set by way of the set of virtual objects exposed to the remote analytics engine via the network. 2. The method of claim 1 , wherein generating the set of virtual objects that represent the set of individual data objects comprises: providing a user with a user interface that enables the user to select the set of individual data objects; detecting the user's selection of the set of individual data objects via the user interface; generating the set of virtual objects that represent the set of individual data objects in response to the user's selection. 3. The method of claim 1 , wherein generating the set of virtual objects that represent the set of individual data objects comprises: extracting information about the set of individual data objects from the monolithic file identified within the secondary storage system; generating, based at least in part on the information extracted from the monolithic file, the set of virtual objects that represent the set of individual data objects. 4. The method of claim 1 , further comprising: identifying another monolithic file that includes a secondary copy of another data set duplicated from a primary copy of the other data set stored in the primary storage system; identifying another set of individual data objects within the secondary copy of the other data set included in the other monolithic file; generating another set of virtual objects that represent the other set of individual data objects identified within the secondary copy of the other data set included in the other monolithic file; wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects and the other set of virtual objects as a virtual file system to the remote analytics engine via the network; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects and the other set of individual data objects by way of the virtual file system exposed to the remote analytics engine via the network. 5. The method of claim 1 , wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects as a virtual file system to the remote analytics engine via the network; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects by way of the virtual file system exposed to the remote analytics engine via the network. 6. The method of claim 1 , wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects to the remote analytics engine without moving or copying the set of individual data objects to the remote analytics engine via the network; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects without moving or copying the set of individual data objects to the remote analytics engine. 7. The method of claim 1 , wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects to the remote analytics engine without moving or copying an equivalent set of individual data objects included in the primary copy of the data set to the remote analytics engine via the network; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects without moving or copying the equivalent set of individual data objects included in the primary copy of the data set to the remote analytics engine via the network. 8. The method of claim 1 , wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects to the remote analytics engine via the network by way of a file system plug-in that interfaces with a file system of the remote analytics engine; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects via the network by way of the file system plug-in that interfaces with the file system of the remote analytics engine. 9. The method of claim 8 , further comprising: receiving, by the file system plug-in, at least one request to perform an Input/Output (I/O) operation on the set of individual data objects; generating, based at least in part on the request to perform the I/O operation, a notification of an anticipated future I/O operation likely to be performed on the set of individual data objects in connection with the analytics job; forwarding the notification of the anticipated future I/O operation from the file system plug-in to the secondary storage system to facilitate prefetching at least a portion of the set of individual data objects in connection with the analytics job. 10. The method of claim 8 , further comprising: receiving, by the file system plug-in, at least one request to perform a write operation on at least a portion of the set of individual data objects; generating, by the file system plug-in, a data representation of the write operation based at least in part on the request to perform the write operation; caching, by the file system plug-in, the data representation of the write operation in memory accessible to the file system plug-in; providing, after completion of the analytics job, the data representation of the write operation to the secondary storage system to facilitate updating the portion of the set of individual data objects. 11. The method of claim 1 , further comprising: identifying another secondary copy of the data set within an

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Selection of displayed objects or displayed text elements (G06F3/0482 takes precedence) · CPC title

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Distributed file systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9864790B1 cover?
The disclosed computer-implemented method for facilitating analytics on data sets stored in remote monolithic files may include (1) identifying, within a secondary storage system, a secondary copy of a data set duplicated from a primary copy of the data set stored in a primary storage system, (2) generating a set of virtual objects that represent at least a portion of the secondary copy of the …
Who is the assignee on this patent?
Veritas Technologies Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30563. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).