Message flow control in a multi-node computer system

US9514023B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9514023-B2
Application numberUS-14478308-A
CountryUS
Kind codeB2
Filing dateJun 24, 2008
Priority dateJun 24, 2008
Publication dateDec 6, 2016
Grant dateDec 6, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the invention provide for controlling message flow across a parallel computer system having multiple compute nodes by selectively grouping compute nodes of such a system into node pools and assigning message flow control policies to nodes in the node pools. The message flow control policies specify logging and/or tracing activities to be performed by instances of applications running on nodes assigned to the node pools. As the application is executed, logging and/or tracing messages are generated on the compute nodes according to message flow control policies assigned to the nodes. Optionally, the message flow is analyzed, the message flow control policies are adjusted, and duplicate messages are eliminated.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for controlling message flow in a parallel computing system having a plurality of compute nodes, the method comprising: assigning a first set of compute nodes to a first node pool; assigning a first message flow control policy to at least two compute nodes of the first node pool, wherein the first message flow control policy specifies at least one logging activity to be performed by an instance of an application running on each of the at least two compute nodes of the first node pool, and wherein subsequent modifications to the assigned first message flow control policy affect one or more of the at least one logging activities performed by each instance of the application running on the at least two compute nodes; initiating execution of the application on each of the compute nodes in the first node pool; while executing the application on the at least two compute nodes of the first node pool, generating a plurality of logging messages according to the first message flow control policy; and upon determining that two or more of the at least two compute nodes of the first node pool are generating duplicate error messages based on content of the plurality of logging messages: assigning a selected one of the two or more compute nodes to a second node pool; and assigning a second message flow control policy corresponding to the second node pool to the selected compute node, wherein the second message flow control policy is distinct from the first message flow control policy, and wherein logging activity performed by the instance of the application running on the selected compute node is controlled by the second message flow control policy rather than the first message flow control policy. 2. The method of claim 1 , further comprising: transmitting, by the first compute node, the one or more generated messages to a service node; and storing, by the service node, each generated message. 3. The method of claim 1 , wherein the message flow control policy specifies logging activity to be performed by a plurality of compute nodes assigned to the first node pool, and wherein a logging activity level is assigned to each instance of the application. 4. The method of claim 3 , wherein the logging activity level specifies a level of verbosity for generating log messages while executing the instance of the application on each respective compute node. 5. The method of claim 3 , wherein the message flow control policy specifies logging activity to be performed by a plurality of compute nodes assigned to the first node pool, and wherein multiple sets of compute nodes of the first node pool are each assigned a respective logging activity level. 6. The method of claim 1 , wherein the message flow control policy specifies logging activity to be performed by the instance of the application running on the compute nodes assigned to the first node pool, and wherein a first logging activity level is assigned to a first instance of the application and a second logging activity level is assigned to at least a second instance of the application, wherein the first and second logging activity levels are different from one another. 7. The method of claim 1 , wherein the message flow control policy specifies logging activity to be performed by a plurality of compute nodes assigned to the first node pool and specifies that the first instance of the application executing on the first compute node should generate log messages relative to a first type of event of the application and a second instance of the application should generate log messages relative to a second type of event. 8. The method of claim 1 , wherein the message flow control policy specifies to generate a logging message for a first occurrence of a specified event and to not generate further logging messages for additional occurrences of the specified event. 9. The method of claim 1 , further comprising: assigning a second set of compute nodes to the second node pool, wherein the second message flow control policy is assigned to each compute node of the second node pool, and wherein the second message flow control policy specifies at least one of logging and/or tracing activity to be performed by an instance of the application running on at least one compute node assigned to the second node pool. 10. The method of claim 1 , wherein each compute node in the at least two compute nodes includes a respective compute node kernel (CNK) and a respective tracing/logging application, and wherein the plurality of logging messages are generated by the tracing/logging application, rather than by the CNK, according to a portion of the first message flow control policy specific to the respective compute node. 11. The method of claim 10 , wherein assigning the second message flow control policy corresponding to the second node pool to the selected compute node is performed without modifying the CNK of the selected compute node. 12. The method of claim 1 , wherein the first message flow control policy specifies that the at least one logging activity is disabled for errors of a first error type, for one or more compute nodes of the first node pool, wherein the at least one logging activity for the first error type remains enabled for the at least two compute nodes of the first node pool. 13. The method of claim 1 , wherein the content of the plurality of logging messages comprises a respective error identifier and a description for each of the plurality of logging messages, and wherein the steps of assigning the selected compute node to the second node pool and assigning the second message flow control policy corresponding to the second node pool to the selected compute node are performed without requiring user interaction. 14. The method of claim 1 , further comprising: analyzing the plurality of logging messages to determine that a first compute node of the at least two compute nodes is generating logging messages that are sufficiently similar to a set of logging messages generated by compute nodes assigned to a third node pool; assigning, without requiring user interaction, the first compute node to the third node pool, rather than the first node pool; and assigning a third message flow control policy corresponding to the third node pool to the first compute node, wherein the third message flow control policy is distinct from both the first message flow control policy and the second message flow control policy, and wherein logging activity performed by the instance of the application running on the first compute node is controlled by the third message flow control policy rather than either the first message flow control policy or the second message flow control policy. 15. A non-transitory computer-readable medium containing a program which, when executed by a processor, performs an operation for controlling message flow in a parallel computing system having a plurality of compute nodes, the operation comprising: assigning a first set of compute nodes to a first node pool; assigning a first message flow control policy to at least two compute nodes of the first node pool, wherein the first message flow control policy specifies at least one logging activity to be performed by an instance of an application running on each of the at least two compute nodes of the first node pool, and wherein subsequent modifications to the assigned first message flow control policy affect one or more of the at least one logging activities performed by each instance of the application running on the at least two compute nodes; initiating execution of the application

Assignees

Inventors

Classifications

  • for systems · CPC title

  • for load management (allocation of a server based on load conditions G06F9/505; load rebalancing G06F9/5083; redistributing the load in a network by a load balancer H04L67/1029) · CPC title

  • Monitoring involving counting · CPC title

  • Address tracing · CPC title

  • in a multiprocessor or a multi-core unit (multiprocessors per se G06F15/80) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9514023B2 cover?
Embodiments of the invention provide for controlling message flow across a parallel computer system having multiple compute nodes by selectively grouping compute nodes of such a system into node pools and assigning message flow control policies to nodes in the node pools. The message flow control policies specify logging and/or tracing activities to be performed by instances of applications run…
Who is the assignee on this patent?
Barsness Eric L, Darrington David L, Peters Amanda, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F11/3495. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 06 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).