Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
From: Ethan Deneault (edeneault_at_[hidden])
Date: 2010-09-21 15:20:56


Rolf vandeVaart wrote:
> Ethan:
>
> Can you run just "hostname" successfully? In other words, a non-MPI
> program.
> If that does not work, then we know the problem is in the runtime. If
> it does works, then
> there is something with the way the MPI library is setting up its
> connections.

Interesting. I did not try this.

 From the master:
$ mpirun -debug-daemons -host merope,asterope -np 2 hostname
asterope
merope

$ mpirun -host merope,asterope,electra -np 3 hostname
asterope
merope

(hangs)

$ mpirun -host electra,asterope,merope -np 3 hostname
asterope
electra

(hangs)

I cannot get 3 nodes to work together. Each node does work if in a pair of two. I can get three
-processes- to work, if I include the master:

$ mpirun -host pleiades,electra,asterope -np 3 hostname
pleiades
electra
asterope

But 4 processes does not:

$ mpirun -host pleiades,electra,asterope,merope -np 4 hostname
pleiades
electra
asterope

(hangs)

> Is there more than one interface on the nodes?

Each node only has eth0, and a static DHCP address.

Is there something in the way that I have the nodes set up? They boot via PXE from an image on the
master, so they should all have the same basic filesystem.

Cheers,
Ethan

>
> Rolf
>
> On 09/21/10 14:41, Ethan Deneault wrote:
>> Prentice Bisbal wrote:
>>
>>>
>>> I'm assuming you already tested ssh connectivity and verified everything
>>> is working as it should. (You did test all that, right?)
>>
>> Yes. I am able to log in remotely to all nodes from the master, and to
>> each node from each node without a password. Each node mounts the same
>> /home directory from the master, so they have the same copy of all the
>> ssh and rsh keys.
>>
>>> This sounds like configuration problem on one of the nodes, or a problem
>>> with ssh. I suspect it's not a problem with the number of processes, but
>>> whichever node is the 4th in your machinefile has a connectivity or
>>> configuration issue:
>>>
>>> I would try the following:
>>>
>>> 1. reorder the list of hosts in your machine file.
>> > 3. Change your machinefile to include 4 completely different hosts.
>>
>> This does not seem to have any beneficial effect.
>>
>> The test program run from the master (pleiades) with any combination
>> of 3 other nodes hangs during communication. This includes not using
>> --machinefile and using -host; i.e.
>>
>> $ mpirun -host merope,electra,atlas -np 4 ./test.out (hangs)
>> $ mpirun -host merope,electra,atlas -np 3 ./test.out (hangs)
>> $ mpirun -host merope,electra -np 3 ./test.out
>> node 1 : Hello world
>> node 0 : Hello world
>> node 2 : Hello world
>>
>>> 2. Run the mpirun command from a different host. I'd try running it from
>>> several different hosts.
>>
>> The mpirun command does not seem to work when launched from one of the
>> nodes. As an example:
>>
>> Running on node asterope:
>>
>> asterope$ mpirun -debug-daemons -host atlas,electra -np 4 ./test.out
>>
>> Daemon was launched on atlas - beginning to initialize
>> Daemon was launched on electra - beginning to initialize
>> Daemon [[54956,0],1] checking in as pid 2716 on host atlas
>> Daemon [[54956,0],1] not using static ports
>> Daemon [[54956,0],2] checking in as pid 2741 on host electra
>> Daemon [[54956,0],2] not using static ports
>>
>> (hangs)
>>
>>> I think someone else recommended that you should be specifying the
>>> number of process with -np. I second that.
>>>
>>> If the above fails, you might want to post your machine file your using.
>>
>> The machine file is a simple list of hostnames, as an example:
>>
>> m43
>> taygeta
>> asterope
>>
>>
>>
>> Cheers,
>> Ethan
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Dr. Ethan Deneault
Assistant Professor of Physics
SC-234
University of Tampa
Tampa, FL 33615
Office: (813) 257-3555