Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Displaying MAIN in Totalview
From: David Turner (dpturner_at_[hidden])
Date: 2011-03-21 13:08:41


About a month ago, this topic was discussed with no real resolution:

We noticed the same problem (TV does not display the user's MAIN
routine upon initial startup), and contacted the TV developers.
They suggested a simple OMPI code modification, which we implemented
and tested; it seems to work fine. Hopefully, this capability
can be restored in future releases.

Here is the body of our communication with the TV developers:


Interestingly enough, someone else asked this very same question
recently and I finally dug into it last week and figured out what was
going on. TotalView publishes a public interface which allows any MPI
implementor to set things up so that it should work fairly seamless with
TotalView. I found that one of the defines in the interface is


and when we find this symbol defined in mpirun (or orterun in Open MPI's
case) then we spend a bit more effort to focus the source pane on the
main routine. As you may guess, this is NOT being defined in OpenMPI
1.4.2. It was being defined in the 1.2.x builds though, in a routine
called totalview.c. OpenMPI has been re-worked significantly since then,
and totalview.c has been replaced by debuggers.c in orte/tools/orterun.
About line 130 to 140 (depending on any changes since my look at the
1.4.1 sources) you should find a number of MPIR_ symbols being defined.

struct MPIR_PROCDESC *MPIR_proctable = NULL;
int MPIR_proctable_size = 0;
int MPIR_being_debugged = 0;
volatile int MPIR_debug_state = 0;
volatile int MPIR_i_am_starter = 0;
volatile int MPIR_partial_attach_ok = 1;

I believe you should be able to insert the line:

int MPIR_force_to_main = 0;

into this section, and then the behavior you are looking for should work
after you rebuild OpenMPI. I haven't yet had the time to do that myself,
but that was all that existed in the 1.2.x sources, and I know those
achieved the desired effect. It's quite possible that someone realized
the symbol was initialized, but wasn't be used anyplace, so they just
removed it. Without realizing we were looking for it in the debugger.
When I pointed this out to the other user, he said he would try it out
and pass it on to the Open MPI group. I just checked on that thread, and
didn't see any update, so I passed on the info myself.


Best regards,
David Turner
User Services Group        email: dpturner_at_[hidden]
NERSC Division             phone: (510) 486-4027
Lawrence Berkeley Lab        fax: (510) 486-4316