Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] After OS Update MPI_Init fails on one host
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-07-23 08:54:19


On Jul 23, 2013, at 3:56 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:

> On Jul 21, 2013, at 8:50 AM, Kevin H. Hobbs <hobbsk_at_[hidden]> wrote:
>
>>> Ah! That would indicate an issue with the external hwloc
>>> package they provided, which is the big reason we don't
>>> recommend installing from packages.
>>
>> I'll happily report the bug to the hwloc developers.
>
> I don't think that this is necessarily an hwloc bug.
>
>> I'll also add what we've found here to the bug on the Fedora
>> bugzilla.
>>
>> Is there anything more I can do on this list to figure out the
>> nature of the bug?
>>
>>> We have internal copies of hwloc and libevent that ensure (a)
>>> they are at the proper level, and (b) they are configured
>>> properly for OMPI's use.
>>
>> It does look like Fedora's hwloc is ahead of OMPI's.
>>
>> Fedora 18 has openmpi-1.6.3 and hwloc-1.4.2.
>>
>> The source of openmpi-1.6.5 has hwloc-1.3.2.
>
> Hypothetically, hwloc 1.4.x is backwards source-compatible with hwloc 1.3.x, but we have not tested this. I don't know if hwloc has, either (I'm sure they haven't tested with Open MPI 1.6.x).
>
>> How can I tell what the configuration differences are?
>>
>> The entire configure section of the .spec file in
>> hwloc-1.4.2-2.fc18.src.rpm is :
>>
>> %configure
>> %{__make} %{?_smp_mflags} V=1
>
> OMPI builds hwloc in "embedded" mode, which means that OMPI's configure line is used to build hwloc (vs. having a separate configure invocation for hwloc). They're hypothetically the moral equivalent of each other, but perhaps something is different somehow...
>
>> I don't see anything that looks like any hwloc configure options
>> are being set.
>>
>> How do I tell how OMPI configures it's bundled hwloc?
>
> With this embedded mechanism, we're calling hwloc's configury with the moral equivalent of:
>
> ./configure --disable-cairo --disable-libxml2 --enable-xml --with-hwloc-symbol-prefix=opal_hwloc152_ --enable-embedded-mode
>
>> Better yet, I'd like to figure out the actual nature of the bug
>> and report it in the proper place.
>
>
> Yes, it's curious that they can't reproduce your issue,

Guess I missed this - where does it say that they can't reproduce the issue?? I'm suspicious because build-from-source produced a working result.

> which suggests that the hwloc issue is a red herring (because, as stated above, hwloc *should* be backwards compatible).
>
> Ralph: is there an easy way to find out more detail on why orte_util_nidmap_init() failed without attaching a debugger?

A debugger would be the best way.

>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>