Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] 1.4 OpenMPI build not working well with TotalView on Darwin
From: Peter Thompson (peter.thompson_at_[hidden])
Date: 2010-01-20 21:18:14


Hi Jeff,

Sorry, speaking in shorthand again.

Jeff Squyres wrote:
> On Jan 8, 2010, at 5:03 PM, Peter Thompson wrote:
>
>> I've tried a few builds of 1.4 on Snow Leopard, and trying to start up TotalView
>> gets some of the more 'standard' problems.
>
> I don't quite know what you mean by "standard" problems...?

That's more or less 'standard problems' that I hear described when someone tries
to build and MPI (not just OpenMPI) and things don't work on first try. I don't
know if you've worked on the interface directly, but you are probably aware that
TotalView has an API where we set up a structure, MPIR_PROCTABLE, based on a
typedef MPIR_PROCDESC, which gets filled in as to what processes are started up
on which nodes. Which allows the debugger to attach to things automatically.
If the build is done so that the files that hold these structures are optimized,
sometimes the typedef is optimized away. Or in the case of other builds, the
file may have the correct optimization (none) but the symbol info is stripped in
the link phase. So it's a typical, or 'standard' issue I face, but hopefully
not for you.
>
>> Either the typdef for MPIR_PROCDESC
>> can't be found, or MPIR_PROCTABLE is missing. You can get things to work if you
>> start up TotalView first and then pick your program and go to the Parallel tab
>> and pick OpenMPI. But it would be nice to get the classic launch working as well.
>
> I'm unclear on how you could find these symbols if you start TV first, etc., but it won't work automatically.

One of the solutions we came up to work around this problem was to start up
TotalView a different way, so that we need not rely on the symbol information at
all. If you start TotalView the 'classic' way, mpirun/mpiexec -tv -np 4 ./foo,
it will look for MPIR_PROCTABLE and the others. If you use the newer 'indirect'
launch, we actually start up the debug servers with MPI, and then use some
cached into to figure the correct process to start up with the debug servers and
how many processes to start. With this method, the symbol information is not
needed. This method works with OpenMPI on just about all platforms. However,
some users prefer the classic launch with -tv, and this seems to be failing with
the latest builds I've done on Darwin. The debug info appears to be preserved
in the .o files, but does not always seem complete. It probably needs another
look on my part, to make sure I'm doing it right. The fact that Snow Leopard
(and maybe some earlier releases) now includes OpenMPI also confuses the issue,
as the version that comes with Darwin does NOT contain the symbol info, and it's
easy enough to get the native OpenMPI, and not pick up the build you intended.

Does that make any more sense?

I'll try playing around with 1.4.1 and see if it's me, or the compilers, or
maybe OpenMPI.

PeterT

>
> Do you have deeper knowledge (given your email address) on exactly what is going wrong?
>