Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem running on multiple nodes with Java bindings
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-11-10 23:58:21


Add --enable-debug to your configure and run it with the following additional options

--leave-session-attached -mca plm_base_verbose 5

Let's see where it fails during the launch phase. Offhand, the only thing that message means to me is that the ssh keys are botched on at least one node. Keep in mind that we use a tree-based launch, and so when you have more than two nodes, one or more of the intermediate nodes are executing an ssh.

One way to see if that's the problem is to launch without the tree spawn: add

-mca plm_rsh_no_tree_spawn 1

to your cmd line and see if it works.

On Nov 10, 2013, at 9:24 AM, Christoffer Hamberg <christoffer.hamberg_at_[hidden]> wrote:

> Hi,
>
> I'm having some strange problems running Open MPI(1.9a1r29559) with Java bindings on a Calxeda highbank ARM Server running Ubuntu 12.10 (GNU/Linux 3.5.0-43-highbank armv7l).
>
> The problem arises when I try to run a job on more than 3 nodes (I have a total of 8).
> Note: It's the same error for any of the node[0-7].
>
> ubuntu_at_node0:~$ mpirun -np 4 -host node0,node1,node2 hostname
> Host key verification failed.
>
> ubuntu_at_node0:~$ mpirun -np 4 -host node0,node1,node2,node3 hostname
> node0
> node0
> node1
> node2
>
> and not running the job on the current node also gives Host key verification failed for only 3 nodes.
>
> ubuntu_at_node0:~$ mpirun -np 4 -host node1,node3,node5 hostname
> Host key verification failed.
>
> But not on 2 nodes:
> ubuntu_at_node0:~$ mpirun -np 4 -host node1,node3 hostname
> node1
> node1
> node3
> node3
>
> I've configured it with the following:
> ./configure --prefix=/opt/openmpi-1.9-java --without-openib --enable-static --with-threads=posix --enable-mpi-thread-multiple --enable-mpi-java --with-jdk-bindir=/usr/lib/jvm/java-7-openjdk-armhf/bin --with-jdk-headers=/usr/lib/jvm/java-7-openjdk-armhf/include
>
> I have Open MPI 1.6.5 (without Java-binding) installed and it runs without any problems on all nodes, so there should be no problem with SSH that the error points to.
>
> Any ideas?
>
> Regards,
> Christoffer
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users