Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpirun problem when running on more than three hosts with OpenMPI 1.8
From: Allan Wu (allwu_at_[hidden])
Date: 2014-04-11 14:17:56


Hello everyone,

I am running a simple helloworld program on several nodes using OpenMPI
1.8. Running commands on single node or small number of nodes are
successful, but when I tried to run the same binary on four different
nodes, problems occurred.

I am using 'mpirun' command line like the following:
# mpirun --prefix /mnt/embedded_root/openmpi -np 4 --map-by node -hostfile
hostfile ./helloworld
And my hostfile looks something like these:
10.0.0.16
10.0.0.17
10.0.0.18
10.0.0.19

When executing this command, it will result in an error message "sh: syntax
error: unexpected word", and the program will deadlock. When I added
"--debug-devel" the output is in the attachment "err_msg_0.txt". In the
log, "fpga0" is the hostname of "10.0.0.16" and "fpga1" is for "10.0.0.17"
and so on.

However, the weird part is that after I remove one line in the hostfile,
the problem goes away. It does not matter which host I remove, as long as
there is less than four hosts, the program can execute without any problem.

I also tried using hostname in the hostfile, as:
fpga0
fpga1
fpga2
fpga3
And the same problem occurs, and the error message becomes "Host key
verification failed.". I have setup public/private key pairs on all nodes,
and each node can ssh to any node without problems. I also attached the
message of --debug-devel as "err_msg_1.txt".

I'm running MPI programs on embedded ARM processors. I have previously
posted questions on cross-compilation on the develop mailing list, which
contains the setup I used. If you need the information please refer to
http://www.open-mpi.org/community/lists/devel/2014/04/14440.php, and the
output of 'ompi-info --all' is also attached with this email.

Please let me know if I need to provide more information. Thanks in advance!

Regards,

--
Di Wu (Allan)
PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
Department of Computer Science, UC Los Angeles
Email: allwu_at_[hidden]