Automatic aggregation for infrastructure string matching
US-2015347386-A1 · Dec 3, 2015 · US
US9864790B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9864790-B1 |
| Application number | US-201414580079-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 22, 2014 |
| Priority date | Dec 22, 2014 |
| Publication date | Jan 9, 2018 |
| Grant date | Jan 9, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The disclosed computer-implemented method for facilitating analytics on data sets stored in remote monolithic files may include (1) identifying, within a secondary storage system, a secondary copy of a data set duplicated from a primary copy of the data set stored in a primary storage system, (2) generating a set of virtual objects that represent at least a portion of the secondary copy of the data set, (3) exposing the set of virtual objects to a remote analytics engine via a network such that the set of individual data objects appears to be stored locally on the remote analytics engine, and then (4) enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects by way of the set of virtual objects via the network. Various other methods, systems, and computer-readable media are also disclosed.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for facilitating analytics on data sets stored in remote monolithic files, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising: identifying, within a secondary storage system, a monolithic file that: includes a secondary copy of a data set duplicated from a primary copy of the data set stored in a primary storage system; stores the secondary copy of the data set as a singular data block; identifying a set of individual data objects within the secondary copy of the data set included in the monolithic file; generating a set of virtual objects that represent the set of individual data objects identified within the secondary copy of the data set included in the monolithic file; exposing the set of virtual objects to a remote analytics engine via a network such that the individual data objects appear to the remote analytics engine to be stored locally on the remote analytics engine, wherein the remote analytics engine comprises a computer cluster that: includes a plurality of nodes; implements a distributed file system that manages data across the plurality of nodes; is unable to natively open or read monolithic files that store data sets as singular data blocks; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects identified within the secondary copy of the data set by way of the set of virtual objects exposed to the remote analytics engine via the network. 2. The method of claim 1 , wherein generating the set of virtual objects that represent the set of individual data objects comprises: providing a user with a user interface that enables the user to select the set of individual data objects; detecting the user's selection of the set of individual data objects via the user interface; generating the set of virtual objects that represent the set of individual data objects in response to the user's selection. 3. The method of claim 1 , wherein generating the set of virtual objects that represent the set of individual data objects comprises: extracting information about the set of individual data objects from the monolithic file identified within the secondary storage system; generating, based at least in part on the information extracted from the monolithic file, the set of virtual objects that represent the set of individual data objects. 4. The method of claim 1 , further comprising: identifying another monolithic file that includes a secondary copy of another data set duplicated from a primary copy of the other data set stored in the primary storage system; identifying another set of individual data objects within the secondary copy of the other data set included in the other monolithic file; generating another set of virtual objects that represent the other set of individual data objects identified within the secondary copy of the other data set included in the other monolithic file; wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects and the other set of virtual objects as a virtual file system to the remote analytics engine via the network; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects and the other set of individual data objects by way of the virtual file system exposed to the remote analytics engine via the network. 5. The method of claim 1 , wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects as a virtual file system to the remote analytics engine via the network; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects by way of the virtual file system exposed to the remote analytics engine via the network. 6. The method of claim 1 , wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects to the remote analytics engine without moving or copying the set of individual data objects to the remote analytics engine via the network; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects without moving or copying the set of individual data objects to the remote analytics engine. 7. The method of claim 1 , wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects to the remote analytics engine without moving or copying an equivalent set of individual data objects included in the primary copy of the data set to the remote analytics engine via the network; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects without moving or copying the equivalent set of individual data objects included in the primary copy of the data set to the remote analytics engine via the network. 8. The method of claim 1 , wherein: exposing the set of virtual objects to the remote analytics engine via the network comprises exposing the set of virtual objects to the remote analytics engine via the network by way of a file system plug-in that interfaces with a file system of the remote analytics engine; enabling the remote analytics engine to perform at least one analytics job on the set of individual data objects comprises enabling the remote analytics engine to perform the analytics job on the set of individual data objects via the network by way of the file system plug-in that interfaces with the file system of the remote analytics engine. 9. The method of claim 8 , further comprising: receiving, by the file system plug-in, at least one request to perform an Input/Output (I/O) operation on the set of individual data objects; generating, based at least in part on the request to perform the I/O operation, a notification of an anticipated future I/O operation likely to be performed on the set of individual data objects in connection with the analytics job; forwarding the notification of the anticipated future I/O operation from the file system plug-in to the secondary storage system to facilitate prefetching at least a portion of the set of individual data objects in connection with the analytics job. 10. The method of claim 8 , further comprising: receiving, by the file system plug-in, at least one request to perform a write operation on at least a portion of the set of individual data objects; generating, by the file system plug-in, a data representation of the write operation based at least in part on the request to perform the write operation; caching, by the file system plug-in, the data representation of the write operation in memory accessible to the file system plug-in; providing, after completion of the analytics job, the data representation of the write operation to the secondary storage system to facilitate updating the portion of the set of individual data objects. 11. The method of claim 1 , further comprising: identifying another secondary copy of the data set within an
Physics · mapped topic
Selection of displayed objects or displayed text elements (G06F3/0482 takes precedence) · CPC title
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Distributed file systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.