Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] mpirun completes for one user, not for another
From: Daniel Fetchinson (fetchinson_at_[hidden])
Date: 2013-02-11 07:11:32

Hi folks,

I have a really strange problem: a super simple MPI test program (see
below) runs successfully for all users when executed on 4 processes in
1 node, but hangs for user A and runs successfully for user B when
executed on 8 processes in 2 nodes. The executable used is the same
and the appfile used is also the same for user A and user B. Both
users launch it by

mpirun --app appfile

where the content of 'appfile' is

-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test

for the single node run with 4 processes and is replaced by

-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test

for the 2-node run with 8 processes. Just to recap, the single node
run works for both user A and user B, but the 2-node run only works
for user B and it hangs for user A. It does respond to Ctrl-C though.
Both users use bash, have set up passwordless ssh, are able to ssh
from node1 to node2 and back, have the same PATH and use the same
'mpirun' executable.

At this point I've run out of ideas what to check and debug because
the setups look really identical. The test program is simply

#include <stdio.h>
#include <mpi.h>

int main( int argc, char **argv )
   int node;

   MPI_Init( &argc, &argv );
   MPI_Comm_rank( MPI_COMM_WORLD, &node );

   printf( "First Hello World from Node %d\n", node );
   MPI_Barrier( MPI_COMM_WORLD );
   printf( "Second Hello World from Node %d\n",node );

   MPI_Finalize( );

   return 0;

I also asked both users to compile the test program separately, and
the resulting executable 'test' is the same for both indicating again
that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is
1.5. and the interconnect is infiniband.

I've really run out of ideas what else to compare between user A and B.

Thanks for any hints,

Psss, psss, put it down! -
Psss, psss, put it down! -