Efficient data deployment for a parallel data processing system

US9582209B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9582209-B2
Application numberUS-201514748262-A
CountryUS
Kind codeB2
Filing dateJun 24, 2015
Priority dateJun 24, 2015
Publication dateFeb 28, 2017
Grant dateFeb 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This document describes techniques for efficient data deployment for a parallel data processing system. In one embodiment, a virtualization platform running a parallel processing application that includes one or more virtual data nodes receives a first command to write a data block to a storage device. The platform then determines whether the first command was sent by a first virtual data node. If the first command was sent by a first virtual data node, the platform then 1) writes, the data block to a first location in the storage device; 2) returns the first location to the first virtual data node and 3) determines whether the data should be replicated. If the data should be replicated, the platform instructs the storage device to make a copy of the data block to a second location in the storage device and storing the second location in a tracking structure.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for deploying a data block comprising: at a virtualization platform running a parallel processing application that includes one or more virtual data nodes: receiving a first command to write a data block to a storage device; determining whether the first command was sent by a first virtual data node; and if the first command was sent by a first virtual data node: writing the data block to a first location in the storage device, returning the first location to the first virtual data node, determining whether the data should be replicated, and if the data should be replicated, instructing the storage device to internally make a copy of the data block to a second location in the storage device and storing the second location in a tracking structure. 2. The method of claim 1 wherein determining whether the first command was sent by a first virtual data node comprises inspecting the storage command for an indication that the data was sent by a virtual data node, and determining whether the data should be replicated comprises inspecting the storage command for an indication that the data should be replicated. 3. The method of claim 1 further comprising: receiving a second command to write the data block to a storage device; determining that the second command was sent by a second virtual data node; and if the first command was sent by a first virtual data node: determining that the data block was already replicated on the storage device, and if the data block was already replicated, returning the location of the copy of the data block stored in the tracking structure to the second virtual data node without writing the data block to the storage device. 4. The method of claim 3 wherein determining whether the second command was sent by a second virtual data node comprises inspecting the storage command for an indication that the data was sent by a virtual data node, and determining whether the data should be replicated comprises inspecting the tracing structure for an entry that maps the second virtual data node to the location of the copy of the data block. 5. The method of claim 2 further comprising: receiving a third command to read the data block from the storage device; determining whether the number of pending I/O's requests for the data block exceeds a threshold; and if the number of pending I/O requests exceeds some threshold: determining that a copy of the data block exists where the number of pending I/O's for the data block is below the threshold, and returning the location of the copy of the data block stored in the tracking structure. 6. The method of claim 5 wherein determining the number of pending I/O's for the data block comprises measuring the size of the I/O queue depth of storage device. 7. A computer system for deploying a data block comprising: a processor; a volatile memory; a nonvolatile storage device; and a non-transitory computer readable storage medium having stored thereon program code that, when executed by the processor, causes the processor to: at a virtualization platform running a parallel processing application that includes one or more virtual data nodes: receiving a first command to write a data block to a storage device; determining whether the first command was sent by a first virtual data node; and if the first command was sent by a first virtual data node: writing the data block to a first location in the storage device, returning the first location to the first virtual data node, determining whether the data should be replicated, and if the data should be replicated, instructing the storage device to internally make a copy of the data block to a second location in the storage device and storing the second location in a tracking structure. 8. The computer system of claim 7 wherein determining whether the first command was sent by a first virtual data node comprises inspecting the storage command for an indication that the data was sent by a virtual data node, and determining whether the data should be replicated, comprises inspecting the storage command for an indication that the data should be replicated. 9. The computer system of claim 7 further comprising: receiving a second command to write the data block to a storage device; determining that the second command was sent by a second virtual data node; and if the first command was sent by a first virtual data node: determining that the data block was already replicated on the storage device, and if the data block was already replicated, returning the location of the copy of the data block stored in the tracking structure to the second virtual data node without writing the data block to the storage device. 10. The computer system of claim 9 wherein determining whether the second command was sent by a second virtual data node comprises inspecting the storage command for an indication that the data was sent by a virtual data node, and determining whether the data should be replicated comprises inspecting the tracing structure for an entry that maps the second virtual data node to the location of the copy of the data block. 11. The computer system of claim 8 further comprising: receiving a third command to read the data block from the storage device; determining whether the number of pending POs requests for the data block exceeds a threshold; and if the number of pending I/O requests exceeds some threshold: determining that a copy of the data block exists where the number of pending I/O's for the data block is below the threshold, and returning the location of the copy of the data block stored in the tracking structure. 12. The computer system of claim 11 wherein determining the number of pending I/O's for the data block comprises measuring the size of the I/O queue depth of storage device. 13. A non-transitory computer readable storage medium having stored thereon program code executable by computer system, the program code embodying a method for deploying a data block comprising: at a virtualization platform running a parallel processing application that includes one or more virtual data nodes: receiving a first command to write a data block to a storage device; determining whether the first command was sent by a first virtual data node; and if the first command was sent by a first virtual data node: writing the data block to a first location in the storage device, returning the first location to the first virtual data node, determining whether the data should be replicated, and if the data should be replicated, instructing the storage device to internally make a copy of the data block to a second location in the storage device and storing the second location in a tracking structure. 14. The non-transitory computer readable storage medium of claim 13 wherein determining whether the first command was sent by a first virtual data node comprises inspecting the storage command for an indication that the data was sent by a virtual data node, and determining whether the data should be replicated comprises inspecting the storage command for an indication that the data should be replicated. 15. The non-transitory computer readable storage medium of claim 13 further comprising: receiving a second command to write the data block to a storage device; determining that the second command was sent by a second virtual data node; and if the first command was sent by a first virtual data node: determining that the data block was already replicated on the storage device, and if the data block was already replicated

Assignees

Inventors

Classifications

  • Replication mechanisms · CPC title

  • Plurality of storage devices · CPC title

  • Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title

  • G06F3/0619Primary

    in relation to data integrity, e.g. data losses, bit errors · CPC title

  • G06F3/0611Primary

    in relation to response time · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9582209B2 cover?
This document describes techniques for efficient data deployment for a parallel data processing system. In one embodiment, a virtualization platform running a parallel processing application that includes one or more virtual data nodes receives a first command to write a data block to a storage device. The platform then determines whether the first command was sent by a first virtual data node.…
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0619. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).