On Tue, Jan 15, 2008 at 07:54:33PM -0500, Mark Kosmowski wrote:
> Dear Open-MPI Community:
> I have a 3 node cluster, each a dual opteron workstation running
> OpenSUSE 10.1 64-bit. The node names are LT, SGT and PFC. When I
> start an mpirun job from either SGT or PFC, things work as they are
> supposed to. However, if I start the same job from LT, the jobs hangs
> at SGT - this was confirmed by mpirun --np 6 --hostfile <correct
> hostfile for the three nodes> hostname, which gives only LT; LT; PFC;
> PFC (and then hangs) when started from LT (this same command started
> from either of the other nodes give two of each of the three hostnames
> and terminates normally). The nfs share drive is physically located
> on LT.
> I have been using ssh to get to either SGT or PFC from a terminal
> opened originally on LT to run jobs. I can ssh from any node to any
> other node.
> I have attached a gzipped tar archive of the three ifconfig results
> (for each node) and the results of ompi_info --all command as
> requested in the "Getting Help" section. I was unable to locate a
> config.log file in the shared ompi directory.
> Any assistance on this matter would be appreciated,
> Mark E. Kosmowski
I'd posted a message earlier about intermittent hangs -- perhaps it's
the same issue. If you run a hundred instances or so of "mpirun --np 6
--hostfile hostfile uptime", from SGT or PFC, do you notice any hangs?
> users mailing list