Rolf vandeVaart wrote:
> Can you run just "hostname" successfully? In other words, a non-MPI
> If that does not work, then we know the problem is in the runtime. If
> it does works, then
> there is something with the way the MPI library is setting up its
Interesting. I did not try this.
From the master:
$ mpirun -debug-daemons -host merope,asterope -np 2 hostname
$ mpirun -host merope,asterope,electra -np 3 hostname
$ mpirun -host electra,asterope,merope -np 3 hostname
I cannot get 3 nodes to work together. Each node does work if in a pair of two. I can get three
-processes- to work, if I include the master:
$ mpirun -host pleiades,electra,asterope -np 3 hostname
But 4 processes does not:
$ mpirun -host pleiades,electra,asterope,merope -np 4 hostname
> Is there more than one interface on the nodes?
Each node only has eth0, and a static DHCP address.
Is there something in the way that I have the nodes set up? They boot via PXE from an image on the
master, so they should all have the same basic filesystem.
> On 09/21/10 14:41, Ethan Deneault wrote:
>> Prentice Bisbal wrote:
>>> I'm assuming you already tested ssh connectivity and verified everything
>>> is working as it should. (You did test all that, right?)
>> Yes. I am able to log in remotely to all nodes from the master, and to
>> each node from each node without a password. Each node mounts the same
>> /home directory from the master, so they have the same copy of all the
>> ssh and rsh keys.
>>> This sounds like configuration problem on one of the nodes, or a problem
>>> with ssh. I suspect it's not a problem with the number of processes, but
>>> whichever node is the 4th in your machinefile has a connectivity or
>>> configuration issue:
>>> I would try the following:
>>> 1. reorder the list of hosts in your machine file.
>> > 3. Change your machinefile to include 4 completely different hosts.
>> This does not seem to have any beneficial effect.
>> The test program run from the master (pleiades) with any combination
>> of 3 other nodes hangs during communication. This includes not using
>> --machinefile and using -host; i.e.
>> $ mpirun -host merope,electra,atlas -np 4 ./test.out (hangs)
>> $ mpirun -host merope,electra,atlas -np 3 ./test.out (hangs)
>> $ mpirun -host merope,electra -np 3 ./test.out
>> node 1 : Hello world
>> node 0 : Hello world
>> node 2 : Hello world
>>> 2. Run the mpirun command from a different host. I'd try running it from
>>> several different hosts.
>> The mpirun command does not seem to work when launched from one of the
>> nodes. As an example:
>> Running on node asterope:
>> asterope$ mpirun -debug-daemons -host atlas,electra -np 4 ./test.out
>> Daemon was launched on atlas - beginning to initialize
>> Daemon was launched on electra - beginning to initialize
>> Daemon [[54956,0],1] checking in as pid 2716 on host atlas
>> Daemon [[54956,0],1] not using static ports
>> Daemon [[54956,0],2] checking in as pid 2741 on host electra
>> Daemon [[54956,0],2] not using static ports
>>> I think someone else recommended that you should be specifying the
>>> number of process with -np. I second that.
>>> If the above fails, you might want to post your machine file your using.
>> The machine file is a simple list of hostnames, as an example:
> users mailing list
Dr. Ethan Deneault
Assistant Professor of Physics
University of Tampa
Tampa, FL 33615
Office: (813) 257-3555