Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
From: Rolf vandeVaart (rolf.vandevaart_at_[hidden])
Date: 2010-09-21 14:58:36


Ethan:

Can you run just "hostname" successfully? In other words, a non-MPI
program.
If that does not work, then we know the problem is in the runtime. If
it does works, then
there is something with the way the MPI library is setting up its
connections.

Is there more than one interface on the nodes?

Rolf

On 09/21/10 14:41, Ethan Deneault wrote:
> Prentice Bisbal wrote:
>
>>
>> I'm assuming you already tested ssh connectivity and verified everything
>> is working as it should. (You did test all that, right?)
>
> Yes. I am able to log in remotely to all nodes from the master, and to
> each node from each node without a password. Each node mounts the same
> /home directory from the master, so they have the same copy of all the
> ssh and rsh keys.
>
>> This sounds like configuration problem on one of the nodes, or a problem
>> with ssh. I suspect it's not a problem with the number of processes, but
>> whichever node is the 4th in your machinefile has a connectivity or
>> configuration issue:
>>
>> I would try the following:
>>
>> 1. reorder the list of hosts in your machine file.
> > 3. Change your machinefile to include 4 completely different hosts.
>
> This does not seem to have any beneficial effect.
>
> The test program run from the master (pleiades) with any combination
> of 3 other nodes hangs during communication. This includes not using
> --machinefile and using -host; i.e.
>
> $ mpirun -host merope,electra,atlas -np 4 ./test.out (hangs)
> $ mpirun -host merope,electra,atlas -np 3 ./test.out (hangs)
> $ mpirun -host merope,electra -np 3 ./test.out
> node 1 : Hello world
> node 0 : Hello world
> node 2 : Hello world
>
>> 2. Run the mpirun command from a different host. I'd try running it from
>> several different hosts.
>
> The mpirun command does not seem to work when launched from one of the
> nodes. As an example:
>
> Running on node asterope:
>
> asterope$ mpirun -debug-daemons -host atlas,electra -np 4 ./test.out
>
> Daemon was launched on atlas - beginning to initialize
> Daemon was launched on electra - beginning to initialize
> Daemon [[54956,0],1] checking in as pid 2716 on host atlas
> Daemon [[54956,0],1] not using static ports
> Daemon [[54956,0],2] checking in as pid 2741 on host electra
> Daemon [[54956,0],2] not using static ports
>
> (hangs)
>
>> I think someone else recommended that you should be specifying the
>> number of process with -np. I second that.
>>
>> If the above fails, you might want to post your machine file your using.
>
> The machine file is a simple list of hostnames, as an example:
>
> m43
> taygeta
> asterope
>
>
>
> Cheers,
> Ethan
>