This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Add --debug-devel to your cmd line and you'll get a bunch of diagnostic
info. Did you configure --enable-debug? If so, then additional debug can be
obtained - can let you know how to get it, if necessary.
On Thu, Jun 18, 2009 at 3:49 PM, Honest Guvnor
> OpenMPI 1.2.7, ethernet, Centos 5.3 i386 fresh install on host and nodes.
> Despite ssh and pdsh working, mpirun hangs when launching a program
> from the host to a node:
> [cluster_at_hankel ~]$ ssh n06 hostname
> [cluster_at_hankel ~]$ pdsh -w n06 hostname
> n06: n06
> [cluster_at_hankel ~]$ mpirun -np 1 --host n06 hostname
> However, mpirun works fine in reverse:
> [cluster_at_n06 ~]$ mpirun -np 1 --host hankel date
> Thu Jun 18 22:53:27 CEST 2009
> and from node to node. Paths to bin and lib seem OK:
> [cluster_at_hankel ~]$ printenv PATH
> [cluster_at_hankel ~]$ printenv LD_LIBRARY_PATH
> [cluster_at_hankel ~]$ ssh n06 printenv PATH
> [cluster_at_hankel ~]$ ssh n06 printenv LD_LIBRARY_PATH
> We are new to openmpi but checked a few mca parameters and turned on a
> diagnostic flag or two but without coming up with much. The nodes do
> not have access to the hosts external network and we half convinced
> ourselves this was the problem because of mentions in the output with
> the -d flag but:
> [cluster_at_hankel ~]$ mpirun --mca btl tcp,self --mca btl_tcp_if_exclude
> lo,eth0 --mca oob_tcp_if_exclude lo,eth0 -np 1 --host n06 hostname
> [STILL HANGS]
> where eth0 is the external network.
> Suggestions gratefully received on how we can get openmpi to report
> what has failed or where to poke and prod further?
> users mailing list