Sorry I have to ask this, did you build your lastest OMPI version, not
just the application, with the -g flag too.
IIRC, when I ran into this issue I was actually able to do stepi's and
eventually pop up the stack however that is really no way to debug a
Unless OMPI is somehow trashing the stack I don't see what OMPI could be
doing to cause this type of an issue. Again when I ran into this issue
known working programs still worked I just was unable to get a full
stack. So it was definitely an interfacing issue between totalview and
the executable (or the result of how the executable and libraries were
compiled). Another thing I noticed was when using Solaris Studio dbx I
was also able to see the full stack where I could not when using
totaview. I am not sure if gdb could also see the full stack or not but
it might be worth a try to attach gdb to a running program and see if
you get a full stack.
On 02/09/2011 05:35 PM, Dennis McRitchie wrote:
> Thanks Terry.
> Unfortunately, -fno-omit-frame-pointer is the default for the Intel
> compiler when --g is used, which I am using since it is necessary for
> source level debugging. So the compiler kindly tells me that it is
> ignoring your suggested option when I specify it. J
> Also, since I can reproduce this problem by simply changing the
> OpenMPI version, without changing the compiler version, it strikes me
> as being more likely to be an OpenMPI-related issue: 1.2.8 works, but
> anything later does not (as described below).
> I have tried different versions of TotalView from 8.1 to 8.9, but all
> behave the same.
> I was wondering if a change to the openmpi-totalview.tcl script might
> be needed?
> *From:*users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
> *On Behalf Of *Terry Dontje
> *Sent:* Wednesday, February 09, 2011 5:02 PM
> *To:* users_at_[hidden]
> *Subject:* Re: [OMPI users] Totalview not showing main program on
> startup with OpenMPI 1.3.x and 1.4.x
> This sounds like something I ran into some time ago that involved the
> compiler omitting frame pointers. You may want to try to compile your
> code with -fno-omit-frame-pointer. I am unsure if you may need to do
> the same while building MPI though.
> On 02/09/2011 02:49 PM, Dennis McRitchie wrote:
> I'm encountering a strange problem and can't find it having been discussed on this mailing list.
> When building and running my parallel program using any recent Intel compiler and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the "Process mpirun is a parallel job. Do you want to stop the job now?" dialog box, and stopping at the start of the program. The code displayed is the source code of my program's function main, and the stack trace window shows that we are stopped in the poll function many levels "up" from my main function's call to MPI_Init. I can then set breakpoints, single step, etc., and the code runs appropriately.
> But when building and running using Intel compilers with OpenMPI 1.3.x or 1.4.x, TotalView displays the usual dialog box, and stops at the start of the program; but my main program's source code is *not* displayed. The stack trace window again shows that we are stopped in the poll function several levels "up" from my main function's call to MPI_Init; but this time, the code displayed is the assembler code for the poll function itself.
> If I click on 'main' in the stack trace window, the source code for my program's function main is then displayed, and I can now set breakpoints, single step, etc. as usual.
> So why is the program's source code not displayed when using 1.3.x and 1.4.x, but is displayed when using 1.2.8. This change in behavior is fairly confusing to our users, and it would be nice to have it work as it used to, if possible.
> Dennis McRitchie
> Computational Science and Engineering Support (CSES)
> Academic Services Department
> Office of Information Technology
> Princeton University
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle *- Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>
> users mailing list
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>