Method and system for providing clustered and parallel data mining of backup data

US9971797B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9971797-B1
Application numberUS-201414558624-A
CountryUS
Kind codeB1
Filing dateDec 2, 2014
Priority dateDec 2, 2014
Publication dateMay 15, 2018
Grant dateMay 15, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment, analytics module of a storage system receives a request for analyzing a data stream stored in the storage system. In response to the request, the analytics module identifies a sparse disk file stored in the storage system representing the requested data stream. The sparse disk file includes payload blocks sparsely located and intertwined with metadata of the sparse disk file. A converter converts the sparse disk file into multiple native disk files based on the payload blocks of the sparse disk file, using a fast-copy method without having physically copying data content of the payload blocks. A block-based accessing interface is provided to allow multiple clients to concurrently access the native disk files, respectively. Each block of content represented by the native disk is accessed based on a block identifier and an offset indicating a location of the block within the native disk file.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for providing an access interface for analyzing backup data in a cluster manner, the method comprising: receiving, by a data analytics module executed by a processor of a storage system, a request for analyzing a backup data stream stored in the storage system; in response to the request, identifying by the data analytics module a sparse disk file stored in the storage system representing the requested backup data stream; converting, by a converter executed by the processor, the sparse disk file into a native disk file based on payload blocks of the sparse disk file; generating a plurality of instant copies of the native disk file; using a fast-copy method without having physically copying data content of the payload blocks, each of the instant copies referencing to the payload blocks; and concurrently providing a block-based accessing interface to each of a plurality of client devices over a network for accessing each of the plurality of instant copies of the native disk file, wherein the client devices are to access the instant copies of the native disk file, respectively, to concurrently perform an analysis on the instant copies of an identical data set associated with the payload blocks. 2. The method of claim 1 , wherein analysis results performed by the plurality of client devices on the instant copies of the native disk are combined to generate a final analysis result for the native disk file. 3. The method of claim 1 , wherein the sparse disk file is a virtual hard disk (VHD) compatible file. 4. The method of claim 1 , wherein the block-based accessing interface is one of a small computer system interface (SCSI) and Fibre channel interface. 5. The method of claim 1 , wherein converting the sparse disk file into a native disk file comprises: parsing the sparse disk file to identify a block allocation table within the sparse disk file, the block allocation table including a plurality of block entries and each block entry corresponding to one of a plurality of data blocks within the payload blocks of the sparse disk file; and for each of the block entries, retrieving an offset from the block entry, accessing a corresponding data block from the payload blocks of the sparse disk file based on the retrieved offset, and determining a pointer of the corresponding data block, wherein the pointer of the data block is used to represent the data block without having to physically copy content of the data block. 6. The method of claim 5 , further comprising: prior to parsing the sparse disk file, creating the native disk file as a place holder; and for each of the block entries of the block allocation table of the sparse disk file, writing the pointer of the corresponding data block in a disk and volume content segment of the native disk file. 7. The method of claim 6 , further comprising writing a master boot record and a partition table of the native disk file based on the disk and volume content segment. 8. The method of claim 6 , wherein a pointer of a data block links to one of a plurality of payload blocks of the sparse disk file, without physically copying a corresponding payload block. 9. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for providing an access interface for analyzing backup data in a cluster manner, the operations comprising: receiving, by a data analytics module of a storage system, a request for analyzing a backup data stream stored in the storage system; in response to the request, identifying by the data analytics module a sparse disk file stored in the storage system representing the requested backup data stream; converting, by a converter executed by a processor, the sparse disk file into a native disk file based on a-payload blocks of the sparse disk file; generating a plurality of instant copies of the native disk file using a fast-copy method without having physically copying data content of the payload blocks, each of the instant copies referencing to the payload blocks; and concurrently providing a block-based accessing interface to each of a plurality of client devices over a network for accessing each of the plurality of instant copies of the native disk file, wherein the client devices are to access the instant copies of the native disk file, respectively, to concurrently perform an analysis on the instant copies of an identical data set associated with the payload blocks. 10. The non-transitory machine-readable medium of claim 9 , wherein analysis results performed by the plurality of client devices on the instant copies of the native disk are combined to generate a final analysis result for the native disk file. 11. The non-transitory machine-readable medium of claim 9 , wherein the sparse disk file is a virtual hard disk (VHD) compatible file. 12. The non-transitory machine-readable medium of claim 9 , wherein the block-based accessing interface is one of a small computer system interface (SCSI) and Fibre channel interface. 13. The non-transitory machine-readable medium of claim 9 , wherein converting the sparse disk file into a native disk file comprises: parsing the sparse disk file to identify a block allocation table within the sparse disk file, the block allocation table including a plurality of block entries and each block entry corresponding to one of a plurality of data blocks within the payload blocks of the sparse disk file; and for each of the block entries, retrieving an offset from the block entry, accessing a corresponding data block from the payload blocks of the sparse disk file based on the retrieved offset, and determining a pointer of the corresponding data block, wherein the pointer of the data block is used to represent the data block without having to physically copy content of the data block. 14. The non-transitory machine-readable medium of claim 13 , wherein the operations further comprise: prior to parsing the sparse disk file, creating the native disk file as a place holder; and for each of the block entries of the block allocation table of the sparse disk file, writing the pointer of the corresponding data block in a disk and volume content segment of the native disk file. 15. The non-transitory machine-readable medium of claim 14 , wherein the operations further comprise writing a master boot record and a partition table of the native disk file based on the disk and volume content segment. 16. The non-transitory machine-readable medium of claim 14 , wherein a pointer of a data block links to one of a plurality of payload blocks of the sparse disk file, without physically copying a corresponding payload block. 17. A storage system, comprising: a processor; a memory; a data analytics module executed in the memory by the processor to receive a request for analyzing a backup data stream stored in the storage system, and in response to the request, to identify a sparse disk file stored in the storage system representing the requested backup data stream; a converter executed in the memory by the processor to convert the sparse disk file into a native disk file based on payload blocks of the sparse disk file, to generate a plurality of instant copies of the native disk file using a fast-copy method without having physically copying data content of the payload blocks, each of the instant copies referencing to the payload blocks; and a block-based accessing interface to allow each of a plurality of client devices over a network for

Assignees

Inventors

Classifications

  • Management of the data involved in backup or backup restore · CPC title

  • Physics · mapped topic

  • using de-duplication of the data · CPC title

  • Indexing structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9971797B1 cover?
According to one embodiment, analytics module of a storage system receives a request for analyzing a data stream stored in the storage system. In response to the request, the analytics module identifies a sparse disk file stored in the storage system representing the requested data stream. The sparse disk file includes payload blocks sparsely located and intertwined with metadata of the sparse …
Who is the assignee on this patent?
Emc Corp, Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/1448. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 15 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).