Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem running on multiple nodes with Java bindings
From: Christoffer Hamberg (christoffer.hamberg_at_[hidden])
Date: 2013-11-11 11:22:55


I re-configured the ssh keys now and for some reason it seems to work. But
what baffles me is that the same ssh configuration worked for the other
installation (1.6.5) but not for this one.

Thanks for the help!

2013/11/11 Reuti <reuti_at_[hidden]>

> Am 11.11.2013 um 10:04 schrieb Christoffer Hamberg:
>
> > (Correction; I mixed up the output of the two first examples in my first
> mail, so it fails on the first one)
> >
> > ubuntu_at_node0:~$ mpirun --leave-session-attached -mca plm_base_verbose 5
> -np 4 -host node0,node1,node2,node3 hostname
> > [node0:01486] mca:base:select:( plm) Querying component [slurm]
> > [node0:01486] mca:base:select:( plm) Skipping component [slurm]. Query
> failed to return a module
> > [node0:01486] mca:base:select:( plm) Querying component [rsh]
> > [node0:01486] mca:base:select:( plm) Query of component [rsh] set
> priority to 10
> > [node0:01486] mca:base:select:( plm) Selected component [rsh]
> > [node2:26962] mca:base:select:( plm) Querying component [rsh]
> > [node2:26962] mca:base:select:( plm) Query of component [rsh] set
> priority to 10
> > [node2:26962] mca:base:select:( plm) Selected component [rsh]
> > [node1:11477] mca:base:select:( plm) Querying component [rsh]
> > [node1:11477] mca:base:select:( plm) Query of component [rsh] set
> priority to 10
> > [node1:11477] mca:base:select:( plm) Selected component [rsh]
> > Host key verification failed.
> >
> >
> > ubuntu_at_node0:~$ mpirun -mca plm_rsh_no_tree_spawn 1 -np 4 -host
> node0,node1,node2,node3 hostname
> > node0
> > node1
> > node2
> > node3
> >
> > So it definetely looks like a problem with the tree spawn. Any clue how
> I could proceed?
>
> The passphraseless ssh is also possible between the nodes? Using hostbased
> authentication it's also possible to enable it for all users without the
> necessity to prepare the ssh keys.
>
> -- Reuti
>
>
> > /Christoffer
> >
> >
> > 2013/11/11 Ralph Castain <rhc_at_[hidden]>
> > Add --enable-debug to your configure and run it with the following
> additional options
> >
> > --leave-session-attached -mca plm_base_verbose 5
> >
> > Let's see where it fails during the launch phase. Offhand, the only
> thing that message means to me is that the ssh keys are botched on at least
> one node. Keep in mind that we use a tree-based launch, and so when you
> have more than two nodes, one or more of the intermediate nodes are
> executing an ssh.
> >
> > One way to see if that's the problem is to launch without the tree
> spawn: add
> >
> > -mca plm_rsh_no_tree_spawn 1
> >
> > to your cmd line and see if it works.
> >
> >
> >
> > On Nov 10, 2013, at 9:24 AM, Christoffer Hamberg <
> christoffer.hamberg_at_[hidden]> wrote:
> >
> >> Hi,
> >>
> >> I'm having some strange problems running Open MPI(1.9a1r29559) with
> Java bindings on a Calxeda highbank ARM Server running Ubuntu 12.10
> (GNU/Linux 3.5.0-43-highbank armv7l).
> >>
> >> The problem arises when I try to run a job on more than 3 nodes (I have
> a total of 8).
> >> Note: It's the same error for any of the node[0-7].
> >>
> >> ubuntu_at_node0:~$ mpirun -np 4 -host node0,node1,node2 hostname
> >> Host key verification failed.
> >>
> >> ubuntu_at_node0:~$ mpirun -np 4 -host node0,node1,node2,node3 hostname
> >> node0
> >> node0
> >> node1
> >> node2
> >>
> >> and not running the job on the current node also gives Host key
> verification failed for only 3 nodes.
> >>
> >> ubuntu_at_node0:~$ mpirun -np 4 -host node1,node3,node5 hostname
> >> Host key verification failed.
> >>
> >> But not on 2 nodes:
> >> ubuntu_at_node0:~$ mpirun -np 4 -host node1,node3 hostname
> >> node1
> >> node1
> >> node3
> >> node3
> >>
> >> I've configured it with the following:
> >> ./configure --prefix=/opt/openmpi-1.9-java --without-openib
> --enable-static --with-threads=posix --enable-mpi-thread-multiple
> --enable-mpi-java --with-jdk-bindir=/usr/lib/jvm/java-7-openjdk-armhf/bin
> --with-jdk-headers=/usr/lib/jvm/java-7-openjdk-armhf/include
> >>
> >> I have Open MPI 1.6.5 (without Java-binding) installed and it runs
> without any problems on all nodes, so there should be no problem with SSH
> that the error points to.
> >>
> >> Any ideas?
> >>
> >> Regards,
> >> Christoffer
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>