Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Totalview broken with 1.5/trunk
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-12-15 15:33:28


Right -- the symbol isn't declared in orterun. It's in libopen-rte.so.

My changes ensure that the .o file that MPIR_Breakpoint is defined in will be pulled in by the linker to be in the mpirun process.

On Dec 15, 2011, at 3:30 PM, Nathan Hjelm wrote:

> Your changes don't break anything but they also don't cause MPIR_Breakpoint to appear in orterun:
> ct-login1:/scratch2/hjelmn hjelmn$ nm `type -p orterun` | grep MPIR
> 000000000060b0e0 B MPIR_attach_fifo
> 000000000060b2e0 B MPIR_being_debugged
> 000000000060b7b0 B MPIR_debug_state
> 000000000060ada0 B MPIR_executable_path
> 000000000060b840 B MPIR_force_to_main
> 000000000060b7b4 B MPIR_forward_comm
> 000000000060b8c8 B MPIR_forward_output
> 000000000060b9fc B MPIR_i_am_starter
> 000000000060afe0 B MPIR_partial_attach_ok
> 000000000060b8c0 B MPIR_proctable
> 000000000060b9f8 B MPIR_proctable_size
> 000000000060b300 B MPIR_server_arguments
>
> -Nathan
>
> On Thu, 15 Dec 2011, Jeff Squyres wrote:
>
>> Ok, here's what I did:
>>
>> https://svn.open-mpi.org/trac/ompi/changeset/25660
>> --> pulls in symbols like MPIR_Breakpoint via a different dummy function
>>
>> https://svn.open-mpi.org/trac/ompi/changeset/25661
>> --> Fixes the ORTE_DECLSPEC typos that George found
>>
>> LANL: Can you verify that this (still) works for you with totalview and stat?
>>
>>
>>
>> On Dec 15, 2011, at 1:35 PM, Jeff Squyres wrote:
>>
>>> On Dec 15, 2011, at 10:28 AM, Ralph Castain wrote:
>>>
>>>>> I have had the chance now to test it with totalview and stat 1.1.0. Looks good. I pushed the fix to the trunk and it will need to be CMRed to 1.5.
>>>
>>> Ralph and I just talked about this on the phone some more -- I don't think https://svn.open-mpi.org/trac/ompi/changeset/25654 was the right fix.
>>>
>>> We still need to ensure that all the symbols in orte/debugger/base/debugger_base_fns.o are pulled in from shared libraries at run-time. Totalview may well have gotten confused if we used the actual MPIR_Breakpoint symbol to pull it in (i.e., it actually broke right there, rather than when we actually invoked MPIR_Breakpoint, later).
>>>
>>> A better way might be to use another dummy function to pull in all the symbols from that .o file. I.e., instead of doing foo=MPIR_Breakpoint, call some other dummy function that lives in debugger_base_fns.c. It's only purpose in life would be to ensure that all the symbols -- including MPIR_Breakpoint -- are pulled in at run time.
>>>
>>> I'll go do that on the trunk right now...
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/