Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpirun exit status
From: Cristian KLEIN (cristiklein_at_[hidden])
Date: 2009-03-19 10:58:52


Hello everybody,

I've been using OpenMPI 1.3's mpirun in Makefiles and observed that the
exit status is not always the one I expect. For example, using an
incorrect machinefile makes mpirun return 0, whereas a non-zero value
would be expected:

--- cut here ---
masternode:~/grid/myTests/hellompi$ env | grep OMPI
OMPI_MCA_plm_rsh_agent=ssh
OMPI_MCA_btl_tcp_if_exclude=lo,myri0
OMPI_MCA_btl=self,tcp

masternode:~/grid/myTests/hellompi$ mpirun.openmpi -machinefile hostfile
./hellompi.openmpi; echo $?
ssh: incorrecthost2.example.com: Name or service not known
ssh: incorrecthost1.example.com: Name or service not known
[snip]
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

0
--- end here ---

The problem comes from the fact that the exitstatus of a process is ORed
with 0xFF and OpenMPI does not take this into consideration. In my
example, the exit status generated was 65280, which has the lower 8 bits
zero.

To solve this problem I suggest the attached patch. If the exitstatus
would become zero, it replaces it with 111, where 111 is obviously a
randomly chosen non-zero number. :D