Are all three machines running the same OS and version, perchance? If
the machines are heterogeneous in terms of OS, glibc version, etc.,
weird things like these hangs can occur.
Additionally, are you running a firewall on any of these machines?
Ensure that iptables isn't running. It doesn't sound like this is
your problem, but it's worth checking in terms of crossing off issues
that can cause problems...
On Jan 15, 2008, at 7:54 PM, Mark Kosmowski wrote:
> Dear Open-MPI Community:
> I have a 3 node cluster, each a dual opteron workstation running
> OpenSUSE 10.1 64-bit. The node names are LT, SGT and PFC. When I
> start an mpirun job from either SGT or PFC, things work as they are
> supposed to. However, if I start the same job from LT, the jobs hangs
> at SGT - this was confirmed by mpirun --np 6 --hostfile <correct
> hostfile for the three nodes> hostname, which gives only LT; LT; PFC;
> PFC (and then hangs) when started from LT (this same command started
> from either of the other nodes give two of each of the three hostnames
> and terminates normally). The nfs share drive is physically located
> on LT.
> I have been using ssh to get to either SGT or PFC from a terminal
> opened originally on LT to run jobs. I can ssh from any node to any
> other node.
> I have attached a gzipped tar archive of the three ifconfig results
> (for each node) and the results of ompi_info --all command as
> requested in the "Getting Help" section. I was unable to locate a
> config.log file in the shared ompi directory.
> Any assistance on this matter would be appreciated,
> Mark E. Kosmowski
> users mailing list