Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] detect hung node
From: Sam Preston (jsam_at_[hidden])
Date: 2010-04-06 13:03:32

Hi all,

I have a problem with the cluster I'm currently using where nodes
'hang' silently from time to time during an MPI call. This causes the
blocked MPI processes to block indefinitely -- the only way to detect
an error is to notice that no more output is being written to the log
files. We're trying to resolve the underlying cause of the nodes
hanging, but in the mean time, is there a way to set a timeout or
something similar to detect this situation? Sorry if this has been
addressed before, I searched the FAQ and archives and didn't come up
with anything.


J. Samuel Preston
Research Assistant
Scientific Computing and Imaging Institute
University of Utah