Information processing apparatus and information processing method
US-2016062811-A1 · Mar 3, 2016 · US
US2016357624A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016357624-A1 |
| Application number | US-201615140574-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 28, 2016 |
| Priority date | Jun 3, 2015 |
| Publication date | Dec 8, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A dump management apparatus having a memory; and a processor that executes a process including: selecting, in response to receiving a notification of an occurrence of a failure from a failure node of a parallel computer having a plurality of nodes, a plurality of nodes that are not scheduled to execute a job within at least a first time needed to perform dump processing of a memory of the failure node and have a memory capacity needed to perform the dump processing as dump-processing target nodes from among a plurality of nodes within a reference range near the failure node; selecting the dump-processing target nodes with a first priority according to which a plurality of adjacent nodes are preferentially selected as a candidate over a plurality of dispersing nodes from among candidates for the dump-processing target nodes; and causing the failure node to transfer a dump file inside the memory of the failure node to memories of the dump-processing target nodes.
Opening claim text (preview).
What is claimed is: 1 . A dump management apparatus comprising: a memory; and a processor that executes a process including: selecting, in response to receiving a notification of an occurrence of a failure from a failure node of a parallel computer having a plurality of nodes, a plurality of nodes that are not scheduled to execute a job within at least a first time needed to perform dump processing of a memory of the failure node and have a memory capacity needed to perform the dump processing as dump-processing target nodes from among a plurality of nodes within a reference range near the failure node; selecting the dump-processing target nodes with a first priority according to which a plurality of adjacent nodes are preferentially selected as a candidate over a plurality of dispersing nodes from among candidates for the dump-processing target nodes; and causing the failure node to transfer a dump file inside the memory of the failure node to memories of the dump-processing target nodes. 2 . The dump management apparatus according to claim 1 , wherein the process includes selecting the dump-processing target nodes with a second priority according to which a plurality of nodes positioned in a narrower region are preferentially selected as a candidate over a plurality of nodes positioned in a wider region from among the candidates for the dump-processing target nodes, the second priority representing a next-highest priority after the first priority. 3 . The dump management apparatus according to claim 2 , wherein the process includes selecting the dump-processing target nodes with a third priority according to which a plurality of nodes positioned in a region with a shorter distance from the failure node are preferentially selected as a candidate over a plurality of nodes positioned in a region with a longer distance from the failure node from among the candidates for the dump-processing target nodes, the third priority representing a next-highest priority after the second priority. 4 . The dump management apparatus according to claim 3 , wherein the process includes selecting the dump-processing target nodes with a fourth priority according to which a plurality of nodes that need a shorter time to transfer dump data from the failure node to the plurality of nodes are preferentially selected as a candidate over a plurality of nodes that need a longer time to transfer the dump data from the failure node to the plurality of nodes from among the candidates for the dump-processing target nodes, the fourth priority representing a next-highest priority after the third priority. 5 . The dump management apparatus according to claim 1 , wherein the process includes newly selecting new dump-processing target nodes every time the first time or longer elapses after the transfer of the dump file by the failure node, and causing dump files inside memories of the old dump-processing target nodes, which are selected before the new dump-processing target nodes, to be transferred to memories of the new dump-processing target nodes. 6 . The dump management apparatus according to claim 1 , further comprising a sub-storage unit, wherein causing dump files inside memories of the dump-processing target nodes to be transferred to the sub-storage unit after an elapse of a second time longer than the first time since the failure of the failure node. 7 . The dump management apparatus according to claim 5 , further comprising a sub-storage unit, wherein the process includes causing dump files inside memories of the dump-processing target nodes to be transferred to the sub-storage unit after an elapse of a second time longer than the first time since the failure of the failure node. 8 . The dump management apparatus according to claim 1 , wherein the process includes mounting a dump-processing file system using memories of the dump-processing target nodes as storage media. 9 . A non-transitory computer storage medium that stores therein a computer readable program for causing a computer to execute a dump management process comprising: selecting, in response to receiving a notification of an occurrence of a failure from a failure node of a parallel computer having a plurality of nodes, a plurality of nodes that are not scheduled to execute a job within at least a first time needed to perform dump processing of a memory of the failure node and have a memory capacity needed to perform the dump processing as dump-processing target nodes from among a plurality of nodes within a reference range near the failure node; selecting the dump-processing target nodes with a first priority according to which a plurality of adjacent nodes are preferentially selected as a candidate over a plurality of dispersing nodes from among candidates for the dump-processing target nodes; and causing the failure node to transfer a dump file inside the memory of the failure node to memories of the dump-processing target nodes. 10 . The non-transitory computer storage medium according to claim 9 , wherein the dump management process includes newly selecting new dump-processing target nodes every time the first time or longer elapses after the transfer of the dump file by the failure node, and causing dump files inside memories of the old dump-processing target nodes, which are selected before the new dump-processing target nodes, to be transferred to memories of the new dump-processing target nodes 11 . The non-transitory computer storage medium according to claim 10 , wherein the dump management process includes causing dump files inside memories of the dump-processing target nodes to be transferred to a sub-storage unit after an elapse of a second time longer than the first time since the failure of the failure node. 12 . A method for a dump management, the method comprising: selecting, in response to receiving a notification of an occurrence of a failure from a failure node of a parallel computer having a plurality of nodes, a plurality of nodes that are not scheduled to execute a job within at least a first time needed to perform dump processing of a memory of the failure node and have a memory capacity needed to perform the dump processing as dump-processing target nodes from among a plurality of nodes within a reference range near the failure node; selecting the dump-processing target nodes with a first priority according to which a plurality of adjacent nodes are preferentially selected as a candidate over a plurality of dispersing nodes from among candidates for the dump-processing target nodes; and causing the failure node to transfer a dump file inside the memory of the failure node to memories of the dump-processing target nodes.
by exceeding a time limit, i.e. time-out, e.g. watchdogs · CPC title
in a remote unit communicating with a single-box computer node experiencing an error/fault (remote testing G06F11/2294) · CPC title
in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
in a multiprocessor or a multi-core unit (multiprocessors per se G06F15/80) · CPC title
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.