Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun completes for one user, not for another
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-02-11 11:16:48


Make sure that the PATH really is identical between users -- especially for non-iteractive logins. E.g.:

env

vs.

ssh othernode env

Also check the LD_LIBRARY_PATH.

On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson <fetchinson_at_[hidden]> wrote:

> Hi folks,
>
> I have a really strange problem: a super simple MPI test program (see
> below) runs successfully for all users when executed on 4 processes in
> 1 node, but hangs for user A and runs successfully for user B when
> executed on 8 processes in 2 nodes. The executable used is the same
> and the appfile used is also the same for user A and user B. Both
> users launch it by
>
> mpirun --app appfile
>
> where the content of 'appfile' is
>
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
>
> for the single node run with 4 processes and is replaced by
>
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
>
> for the 2-node run with 8 processes. Just to recap, the single node
> run works for both user A and user B, but the 2-node run only works
> for user B and it hangs for user A. It does respond to Ctrl-C though.
> Both users use bash, have set up passwordless ssh, are able to ssh
> from node1 to node2 and back, have the same PATH and use the same
> 'mpirun' executable.
>
> At this point I've run out of ideas what to check and debug because
> the setups look really identical. The test program is simply
>
> #include <stdio.h>
> #include <mpi.h>
>
> int main( int argc, char **argv )
> {
> int node;
>
> MPI_Init( &argc, &argv );
> MPI_Comm_rank( MPI_COMM_WORLD, &node );
>
> printf( "First Hello World from Node %d\n", node );
> MPI_Barrier( MPI_COMM_WORLD );
> printf( "Second Hello World from Node %d\n",node );
>
> MPI_Finalize( );
>
> return 0;
> }
>
>
> I also asked both users to compile the test program separately, and
> the resulting executable 'test' is the same for both indicating again
> that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is
> 1.5. and the interconnect is infiniband.
>
> I've really run out of ideas what else to compare between user A and B.
>
> Thanks for any hints,
> Daniel
>
>
>
>
>
> --
> Psss, psss, put it down! - http://www.cafepress.com/putitdown
>
>
>
> --
> Psss, psss, put it down! - http://www.cafepress.com/putitdown
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/