On Jul 3, 2013, at 1:00 PM, Riccardo Murri <riccardo.murri_at_[hidden]> wrote:
> Hi Jeff, Ralph,
> first of all: thanks for your work on this!
> On 3 July 2013 21:09, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
>> 1. The root cause of the issue is that you are assigning a
>> non-existent IP address to a name. I.e., <foo> maps to 127.0.1.1,
>> but that IP address does not exist anywhere. Hence, OMPI will never
>> conclude that that <foo> is "local". If you had assigned <foo> to
>> the 127.0.0.1 address, things should have worked fine.
> Ok, I see. Would that have worked also if I had added the 127.0.1.1
> address to the "lo" interface (in addition to 127.0.0.1)?
Probably, but I can't say for sure.
>> Just curious: why are you doing this?
> It's commonplace in Ubuntu/Debian installations; see, e.g.,
> In our case, it was rolled out as a fix for some cron job running on
> Apache servers (apparently Debian's Apache looks up 127.0.1.1 and uses
> that as the ServerName, unless a server name is not explicitly
> configured), which was later extended to all hosts because "what harm
> can it do?".
> (Needless to say, we have rolled back the change.)
Weird - never heard of that before!
>> 2. That being said, OMPI is not currently looking at all the
>> responses from gethostbyname() -- we're only looking at the first
>> one. In the spirit of how clients are supposed to behave when
>> multiple IP addresses are returned from a single name lookup, OMPI
>> should examine all of those addresses and see if it finds one that
>> it "likes", and then use that. So we should extend OMPI to examine
>> all the IP addresses from gethostbyname().
> Just for curiosity: would it have worked, had I compiled OMPI with
> IPv6 support? (As far as I understand IPv6, an application is
> required to examine all the addresses returned for a host name, and
> not just pick the first one.)
Actually, yes - for some reason, the code path when IPv6 support is enabled had already been extended to look at all addresses. Not sure why, but that change was never carried over to the IPv6-disabled code path. I've done so now, so this won't be a problem in the future.
>> Ralph is going to work on this, but it'll likely take him a little
>> time to get it done. We'll get it into the trunk and probably ask
>> you to verify that it works for you. And if so, we'll back-port to
>> the v1.6 and v1.7 series.
> I'm glad to help and verify, but I guess we do not need the backport
> or an urgent fix. The easy workaround for us was to remove the
> 127.0.1.1 line from the compute nodes (we keep it only on Apache
> servers where it originated).
Glad you found an easy solution!
> users mailing list