Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)
From: Nathan Hjelm (hjelmn_at_[hidden])
Date: 2011-12-15 18:42:20


I have an idea. How about we set those the MPIR variables as weak. Just tested it with STAT.

Can you replace orte/tools/orterun/orterun.c with the attached version and see if it fixes the issue?

-Nathan

On Thu, 15 Dec 2011, Ashley Pittman wrote:

>
> padb just calls gdb, you can see the error using gdb alone using just the trace I sent when I started this thread.
>
> Perhaps the difference is in versions of gdb, I could give you a login to my test machine if you need?
>
> Ashley.
>
> On 15 Dec 2011, at 22:49, Nathan Hjelm wrote:
>
>> Whats odd is totalview, STAT, and GDB see the correct values despite them being in the B section. What does padb do differently?
>>
>> This is a dynamic, optimized build of 1.5.5rc1.
>>
>> -Nathan Hjelm
>> HPC-3, LANL
>>
>> On Thu, 15 Dec 2011, Ashley Pittman wrote:
>>
>>>
>>> If I add a new symbol to orte/mca/debugger/base/debugger_base_open.c and declare it in orte/mca/debugger/base/base.h, the same as MPIR_proctable_size is defined then it appears in the .so but not in the binary, if I then reference this variable in orte/tools/orterun/orterun.c the symbol appears in orterun. It's definably coming from that declaration, what isn't so clear is how it's getting into the binary. I can only assume that orte/mca/debugger/base/debugger_base_fns.c is linked into the binary directly and the symbol is optimised away in the case where it's defined but not used.
>>>
>>> Ashley.
>>>
>>> On 15 Dec 2011, at 22:09, Nathan Hjelm wrote:
>>>
>>>> orte/tools/orterun/debuggers.c does not exist anymore (its not in the 1.5.5rc1 tarball). I don't know why the symbols are showing up in section B of orterun. Investigating now.
>>>>
>>>> -Nathan Hjelm
>>>> HPC-3, LANL
>>>>
>>>> On Thu, 15 Dec 2011, George Bosilca wrote:
>>>>
>>>>>
>>>>> On Dec 15, 2011, at 16:55 , Ashley Pittman wrote:
>>>>>
>>>>>> There is a problem with 1.5.5rc1 that prevents padb from loading the process table start from the orterun process, what appears to be happening is that MPIR_proctable and MPIR_proctable_size is present in both orterun itself and also in libopen-rte.so, the code is correctly setting them in libopen-rte.so however when gdb is picking the variable from orterun in preference and hence padb is reading NULL values.
>>>>>
>>>>> Indeed, there are two definitions, but a single declaration. This is true for both the trunk and the 1.5.
>>>>>
>>>>> ./orte/mca/debugger/base/base.h:61:ORTE_DECLSPEC extern struct MPIR_PROCDESC *MPIR_proctable;
>>>>> ./orte/mca/debugger/base/base.h:62:ORTE_DECLSPEC extern int MPIR_proctable_size;
>>>>>
>>>>> ./orte/mca/debugger/base/debugger_base_open.c:42:struct MPIR_PROCDESC *MPIR_proctable = NULL;
>>>>> ./orte/mca/debugger/base/debugger_base_open.c:43:int MPIR_proctable_size = 0;
>>>>>
>>>>> ./orte/tools/orterun/debuggers.c:142:struct MPIR_PROCDESC *MPIR_proctable = NULL;
>>>>> ./orte/tools/orterun/debuggers.c:143:int MPIR_proctable_size = 0;
>>>>>
>>>>> george.
>>>>>
>>>>>
>>>>>> Attached is a log showing the problem, the only change I made to the source is to add a call to orte_debugger_base_dump() before the return from orte_debugger_base_init_after_spawn(), it looks like this could also have been achieved via a debug setting but I couldn't see how.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>