Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi 1.6.3 fails to identify local host if its IP is
From: Riccardo Murri (riccardo.murri_at_[hidden])
Date: 2013-07-03 16:00:49

Hi Jeff, Ralph,

first of all: thanks for your work on this!

On 3 July 2013 21:09, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> 1. The root cause of the issue is that you are assigning a
> non-existent IP address to a name. I.e., <foo> maps to,
> but that IP address does not exist anywhere. Hence, OMPI will never
> conclude that that <foo> is "local". If you had assigned <foo> to
> the address, things should have worked fine.

Ok, I see. Would that have worked also if I had added the
address to the "lo" interface (in addition to

> Just curious: why are you doing this?

It's commonplace in Ubuntu/Debian installations; see, e.g.,

In our case, it was rolled out as a fix for some cron job running on
Apache servers (apparently Debian's Apache looks up and uses
that as the ServerName, unless a server name is not explicitly
configured), which was later extended to all hosts because "what harm
can it do?".

(Needless to say, we have rolled back the change.)

> 2. That being said, OMPI is not currently looking at all the
> responses from gethostbyname() -- we're only looking at the first
> one. In the spirit of how clients are supposed to behave when
> multiple IP addresses are returned from a single name lookup, OMPI
> should examine all of those addresses and see if it finds one that
> it "likes", and then use that. So we should extend OMPI to examine
> all the IP addresses from gethostbyname().

Just for curiosity: would it have worked, had I compiled OMPI with
IPv6 support? (As far as I understand IPv6, an application is
required to examine all the addresses returned for a host name, and
not just pick the first one.)

> Ralph is going to work on this, but it'll likely take him a little
> time to get it done. We'll get it into the trunk and probably ask
> you to verify that it works for you. And if so, we'll back-port to
> the v1.6 and v1.7 series.

I'm glad to help and verify, but I guess we do not need the backport
or an urgent fix. The easy workaround for us was to remove the line from the compute nodes (we keep it only on Apache
servers where it originated).