Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Troubleshooting mpirun with tree spawn hang
From: Anthony Alba (ascanio.alba7_at_[hidden])
Date: 2014-04-11 14:38:10


Ooops I meant = false.

Thanks for the tip, it turns out the fault lay in a specific node that
required oob_tcp_if_include to be set.

On Friday, 11 April 2014, Ralph Castain <rhc_at_[hidden]> wrote:

> I'm a little confused - the "no_tree_spawn=true" option means that we are
> *not* using tree spawn, and so mpirun is directly launching each daemon
> onto its node. Thus, this requires that the host mpirun is on be able to
> ssh to every other host in the allocation.
>
> You can debug the rsh launcher by setting "-mca plm_base_verbose 5
> --debug-daemons" on the cmd line.
>
>
> On Apr 10, 2014, at 9:50 PM, Anthony Alba <ascanio.alba7_at_[hidden]<javascript:;>>
> wrote:
>
> >
> > Is there a way to troubleshoot
> > plm_rsh_no_tree_spawn=true hang?
> >
> > I have a set of passwordless-ssh nodes, each node can ssh into any
> other., i.e.,
> >
> > for h1 in A B C D; do for h2 in A B C D; do ssh $h1 ssh $h2 hostname;
> done; done
> >
> > works perfectly.
> >
> > Generally tree spawn works, however there is one host where
> > launching mpirun with tree spawn hangs as soon as there are 6 or more
> host (with launch node also in the host list). If the launcher is not in
> the host list the hang happens with five hosts.
> >
> >
> > - Anthony
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden] <javascript:;>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <javascript:;>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>