Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi 1.6.3 fails to identify local host if its IP is 127.0.1.1
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-06-19 09:09:36


I don't see a hostfile on your command line - so I assume you are using a default hostfile? What is in it?

On Jun 19, 2013, at 1:49 AM, Sergio Maffioletti <sergio.maffioletti_at_[hidden]> wrote:

> Hello,
>
> we have been hit observing a strange behavior with OpenMPI 1.6.3
>
> strace -f /share/apps/openmpi/1.6.3/bin/mpiexec -n 2
> --nooversubscribe --display-allocation --display-map --tag-output
> /share/apps/gamess/2011R1/gamess.2011R1.x
> /state/partition1/rmurri/29515/exam01.F05 -scr
> /state/partition1/rmurri/29515
>
> ====================== ALLOCATED NODES ======================
>
> Data for node: nh64-1-17.local Num slots: 0 Max slots: 0
> Data for node: nh64-1-17 Num slots: 2 Max slots: 0
>
> =================================================================
>
> ======================== JOB MAP ========================
>
> Data for node: nh64-1-17 Num procs: 2
> Process OMPI jobid: [37108,1] Process rank: 0
> Process OMPI jobid: [37108,1] Process rank: 1
>
> =============================================================
>
> As you can see, the host file lists the *unqualified* local host name;
> OpenMPI fails to recognize that as the same host where it is running,
> and uses `ssh` to spawn a remote `orted`, as use of `strace -f` shows:
>
> Process 16552 attached
> [pid 16552] execve("//usr/bin/ssh", ["/usr/bin/ssh", "-x",
> "nh64-1-17", "OPAL_PREFIX=/share/apps/openmpi/1.6.3 ; export
> OPAL_PREFIX; PATH=/share/apps/openmpi/1.6.3/bin:$PATH ; export PATH ;
> LD_LIBRARY_PATH=/share/apps/openmpi/1.6.3/lib:$LD_LIBRARY_PATH ;
> export LD_LIBRARY_PATH ;
> DYLD_LIBRARY_PATH=/share/apps/openmpi/1.6.3/lib:$", "--daemonize",
> "-mca", "ess", "env", "-mca", "orte_ess_jobid", "2431909888", "-mca",
> "orte_ess_vpid", "1", "-mca", "orte_ess_num_procs", "2", "--hnp-uri",
> "\"2431909888.0;tcp://10.1.255.237:33154\"", "-mca", "plm", "rsh"],
> ["OLI235=/state/partition1/rmurri/29515/exam01.F235", ...
>
> If the machine file lists the FQDNs instead, `mpiexec` spawns the jobs
> directly via fork()/exec().
>
> This seems related to the fact that each compute node advertises
> 127.0.1.1 as the IP address associated to its hostname:
>
> $ ssh nh64-1-17 getent hosts nh64-1-17
> 127.0.1.1 nh64-1-17.local nh64-1-17
>
> Indeed, if I change /etc/hosts so that a compute node associates a
> "real" IP with its hostname, `mpiexec` works as expected.
>
> Is this a known feature/bug/easter egg?
>
> For the record: using OpenMPI 1.6.3 on Rocks 5.2.
>
> Thanks,
> on behalf of the GC3 Team
> Sergio :)
>
> GC3: Grid Computing Competence Center
> http://www.gc3.uzh.ch/
> University of Zurich
> Winterthurerstrasse 190
> CH-8057 Zurich Switzerland
> Tel: +41 44 635 4222
> Fax: +41 44 635 6888
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users