Dump management apparatus, dump management program, and dump management method

US2016357624A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016357624-A1
Application numberUS-201615140574-A
CountryUS
Kind codeA1
Filing dateApr 28, 2016
Priority dateJun 3, 2015
Publication dateDec 8, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A dump management apparatus having a memory; and a processor that executes a process including: selecting, in response to receiving a notification of an occurrence of a failure from a failure node of a parallel computer having a plurality of nodes, a plurality of nodes that are not scheduled to execute a job within at least a first time needed to perform dump processing of a memory of the failure node and have a memory capacity needed to perform the dump processing as dump-processing target nodes from among a plurality of nodes within a reference range near the failure node; selecting the dump-processing target nodes with a first priority according to which a plurality of adjacent nodes are preferentially selected as a candidate over a plurality of dispersing nodes from among candidates for the dump-processing target nodes; and causing the failure node to transfer a dump file inside the memory of the failure node to memories of the dump-processing target nodes.

First claim

Opening claim text (preview).

What is claimed is: 1 . A dump management apparatus comprising: a memory; and a processor that executes a process including: selecting, in response to receiving a notification of an occurrence of a failure from a failure node of a parallel computer having a plurality of nodes, a plurality of nodes that are not scheduled to execute a job within at least a first time needed to perform dump processing of a memory of the failure node and have a memory capacity needed to perform the dump processing as dump-processing target nodes from among a plurality of nodes within a reference range near the failure node; selecting the dump-processing target nodes with a first priority according to which a plurality of adjacent nodes are preferentially selected as a candidate over a plurality of dispersing nodes from among candidates for the dump-processing target nodes; and causing the failure node to transfer a dump file inside the memory of the failure node to memories of the dump-processing target nodes. 2 . The dump management apparatus according to claim 1 , wherein the process includes selecting the dump-processing target nodes with a second priority according to which a plurality of nodes positioned in a narrower region are preferentially selected as a candidate over a plurality of nodes positioned in a wider region from among the candidates for the dump-processing target nodes, the second priority representing a next-highest priority after the first priority. 3 . The dump management apparatus according to claim 2 , wherein the process includes selecting the dump-processing target nodes with a third priority according to which a plurality of nodes positioned in a region with a shorter distance from the failure node are preferentially selected as a candidate over a plurality of nodes positioned in a region with a longer distance from the failure node from among the candidates for the dump-processing target nodes, the third priority representing a next-highest priority after the second priority. 4 . The dump management apparatus according to claim 3 , wherein the process includes selecting the dump-processing target nodes with a fourth priority according to which a plurality of nodes that need a shorter time to transfer dump data from the failure node to the plurality of nodes are preferentially selected as a candidate over a plurality of nodes that need a longer time to transfer the dump data from the failure node to the plurality of nodes from among the candidates for the dump-processing target nodes, the fourth priority representing a next-highest priority after the third priority. 5 . The dump management apparatus according to claim 1 , wherein the process includes newly selecting new dump-processing target nodes every time the first time or longer elapses after the transfer of the dump file by the failure node, and causing dump files inside memories of the old dump-processing target nodes, which are selected before the new dump-processing target nodes, to be transferred to memories of the new dump-processing target nodes. 6 . The dump management apparatus according to claim 1 , further comprising a sub-storage unit, wherein causing dump files inside memories of the dump-processing target nodes to be transferred to the sub-storage unit after an elapse of a second time longer than the first time since the failure of the failure node. 7 . The dump management apparatus according to claim 5 , further comprising a sub-storage unit, wherein the process includes causing dump files inside memories of the dump-processing target nodes to be transferred to the sub-storage unit after an elapse of a second time longer than the first time since the failure of the failure node. 8 . The dump management apparatus according to claim 1 , wherein the process includes mounting a dump-processing file system using memories of the dump-processing target nodes as storage media. 9 . A non-transitory computer storage medium that stores therein a computer readable program for causing a computer to execute a dump management process comprising: selecting, in response to receiving a notification of an occurrence of a failure from a failure node of a parallel computer having a plurality of nodes, a plurality of nodes that are not scheduled to execute a job within at least a first time needed to perform dump processing of a memory of the failure node and have a memory capacity needed to perform the dump processing as dump-processing target nodes from among a plurality of nodes within a reference range near the failure node; selecting the dump-processing target nodes with a first priority according to which a plurality of adjacent nodes are preferentially selected as a candidate over a plurality of dispersing nodes from among candidates for the dump-processing target nodes; and causing the failure node to transfer a dump file inside the memory of the failure node to memories of the dump-processing target nodes. 10 . The non-transitory computer storage medium according to claim 9 , wherein the dump management process includes newly selecting new dump-processing target nodes every time the first time or longer elapses after the transfer of the dump file by the failure node, and causing dump files inside memories of the old dump-processing target nodes, which are selected before the new dump-processing target nodes, to be transferred to memories of the new dump-processing target nodes 11 . The non-transitory computer storage medium according to claim 10 , wherein the dump management process includes causing dump files inside memories of the dump-processing target nodes to be transferred to a sub-storage unit after an elapse of a second time longer than the first time since the failure of the failure node. 12 . A method for a dump management, the method comprising: selecting, in response to receiving a notification of an occurrence of a failure from a failure node of a parallel computer having a plurality of nodes, a plurality of nodes that are not scheduled to execute a job within at least a first time needed to perform dump processing of a memory of the failure node and have a memory capacity needed to perform the dump processing as dump-processing target nodes from among a plurality of nodes within a reference range near the failure node; selecting the dump-processing target nodes with a first priority according to which a plurality of adjacent nodes are preferentially selected as a candidate over a plurality of dispersing nodes from among candidates for the dump-processing target nodes; and causing the failure node to transfer a dump file inside the memory of the failure node to memories of the dump-processing target nodes.

Assignees

Inventors

Classifications

  • by exceeding a time limit, i.e. time-out, e.g. watchdogs · CPC title

  • in a remote unit communicating with a single-box computer node experiencing an error/fault (remote testing G06F11/2294) · CPC title

  • in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title

  • in a multiprocessor or a multi-core unit (multiprocessors per se G06F15/80) · CPC title

  • the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016357624A1 cover?
A dump management apparatus having a memory; and a processor that executes a process including: selecting, in response to receiving a notification of an occurrence of a failure from a failure node of a parallel computer having a plurality of nodes, a plurality of nodes that are not scheduled to execute a job within at least a first time needed to perform dump processing of a memory of the failu…
Who is the assignee on this patent?
Fujitsu Ltd
What technology area does this patent fall under?
Primary CPC classification G06F11/0778. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).