Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] After OS Update MPI_Init fails on one host
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-07-23 09:54:56

Kevin --

I don't know if Fedora RPMs include -g in their builds, or if Fedora includes a debuginfo RPM that you could install such that you can attach a debugger and be able to dig into OMPI's internals yourself.

If that doesn't work, you might need to build from source yourself, link against the external hwloc (you said you could replicate the error this way), and compile with -g (e.g., "./configure CFLAGS=-g LDFLAGS=-g ..."). This would allow you to gdb attach and see what's going on.

Alternatively, you could add some opal_output(0, "printf like args here"); statements in the orte_util_nidmap_init() function to see where it's failing (look in orte/util/nidmap.c).

On Jul 23, 2013, at 9:36 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> I see - I didn't look at the redhat bug list. Sadly, I have no idea how to debug it. The Fedora package is built optimized, so no OMPI debugging output is available and a debugger won't tell us a lot.
> Best guess is that there is something in the build that doesn't match the user's system. The nidmap_init routine unpacks a buffer that contains a bunch of process mapping info that mpirun packed into it - don't usually see an error in there.
> On Jul 23, 2013, at 5:57 AM, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:
>> On Jul 23, 2013, at 8:54 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> Yes, it's curious that they can't reproduce your issue,
>>> Guess I missed this - where does it say that they can't reproduce the issue?? I'm suspicious because build-from-source produced a working result.
>> Orion mentioned it in
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
For corporate legal information go to: