Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi 1.6.3 fails to identify local host if its IP is 127.0.1.1
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-06-19 14:42:19


Hmmm..certainly sounds like a bug. It should pickup that the node is local. It checks the hostname (as returned by gethostname), but it also checks to see if host resolves to a local address. I'm assuming that the offending host has some other address besides just 127.0.1.1 as otherwise it couldn't connect to anything.

I'm heading out the door for a couple of weeks, but can try to look at it when I return.

On Jun 19, 2013, at 10:43 AM, Riccardo Murri <riccardo.murri_at_[hidden]> wrote:

> On 19 June 2013 16:01, Ralph Castain <rhc_at_[hidden]> wrote:
>> How is OMPI picking up this hostfile? It isn't being specified on the cmd line - are you running under some resource manager?
>
> Via the environment variable `OMPI_MCA_orte_default_hostfile`.
>
> We're running under SGE, but disable the OMPI/SGE integration (rather
> old version of SGE, does not coordinate well with OpenMPI); here's the
> relevant snippet from our startup script:
>
> # the OMPI/SGE integration does not seem to work with
> # our SGE version; so use the `mpi` PE and direct OMPI
> # to look for a "plain old" machine file
> unset PE_HOSTFILE
> if [ -r "${TMPDIR}/machines" ]; then
> OMPI_MCA_orte_default_hostfile="${TMPDIR}/machines"
> export OMPI_MCA_orte_default_hostfile
> fi
> GMSCOMMAND="$openmpi_root/bin/mpiexec -n $NCPUS --nooversubscribe
> $gamess $INPUT -scr $(pwd)"
>
> The `$TMPDIR/machines` hostfile is created from SGE's $PE_HOSTFILE by
> extracting the host names, and repeating each one for the given number
> of slots (unmodified code that comes with SGE):
>
> PeHostfile2MachineFile()
> {
> cat $1 | while read line; do
> # echo $line
> host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
> nslots=`echo $line|cut -f2 -d" "`
> i=1
> while [ $i -le $nslots ]; do
> echo $host
> i=`expr $i + 1`
> done
> done
> }
>
> Thanks,
> Riccardo
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users