Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-12-16 10:39:37


Why do the symbols need to be weak? Remember that not all platforms support weak symbols.

The symbols don't need to be in the executable itself, right? It should be fine for them to be a library (e.g., libopen-rte.so/a).

On Dec 16, 2011, at 4:51 AM, Ashley Pittman wrote:

>
> Yes, this fixes the issue.
>
> Ashley.
>
> On 15 Dec 2011, at 23:42, Nathan Hjelm wrote:
>
>> I have an idea. How about we set those the MPIR variables as weak. Just tested it with STAT.
>>
>> Can you replace orte/tools/orterun/orterun.c with the attached version and see if it fixes the issue?
>>
>> -Nathan
>>
>> On Thu, 15 Dec 2011, Ashley Pittman wrote:
>>
>>>
>>> padb just calls gdb, you can see the error using gdb alone using just the trace I sent when I started this thread.
>>>
>>> Perhaps the difference is in versions of gdb, I could give you a login to my test machine if you need?
>>>
>>> Ashley.
>>>
>>> On 15 Dec 2011, at 22:49, Nathan Hjelm wrote:
>>>
>>>> Whats odd is totalview, STAT, and GDB see the correct values despite them being in the B section. What does padb do differently?
>>>>
>>>> This is a dynamic, optimized build of 1.5.5rc1.
>>>>
>>>> -Nathan Hjelm
>>>> HPC-3, LANL
>>>>
>>>> On Thu, 15 Dec 2011, Ashley Pittman wrote:
>>>>
>>>>>
>>>>> If I add a new symbol to orte/mca/debugger/base/debugger_base_open.c and declare it in orte/mca/debugger/base/base.h, the same as MPIR_proctable_size is defined then it appears in the .so but not in the binary, if I then reference this variable in orte/tools/orterun/orterun.c the symbol appears in orterun. It's definably coming from that declaration, what isn't so clear is how it's getting into the binary. I can only assume that orte/mca/debugger/base/debugger_base_fns.c is linked into the binary directly and the symbol is optimised away in the case where it's defined but not used.
>>>>>
>>>>> Ashley.
>>>>>
>>>>> On 15 Dec 2011, at 22:09, Nathan Hjelm wrote:
>>>>>
>>>>>> orte/tools/orterun/debuggers.c does not exist anymore (its not in the 1.5.5rc1 tarball). I don't know why the symbols are showing up in section B of orterun. Investigating now.
>>>>>>
>>>>>> -Nathan Hjelm
>>>>>> HPC-3, LANL
>>>>>>
>>>>>> On Thu, 15 Dec 2011, George Bosilca wrote:
>>>>>>
>>>>>>>
>>>>>>> On Dec 15, 2011, at 16:55 , Ashley Pittman wrote:
>>>>>>>
>>>>>>>> There is a problem with 1.5.5rc1 that prevents padb from loading the process table start from the orterun process, what appears to be happening is that MPIR_proctable and MPIR_proctable_size is present in both orterun itself and also in libopen-rte.so, the code is correctly setting them in libopen-rte.so however when gdb is picking the variable from orterun in preference and hence padb is reading NULL values.
>>>>>>>
>>>>>>> Indeed, there are two definitions, but a single declaration. This is true for both the trunk and the 1.5.
>>>>>>>
>>>>>>> ./orte/mca/debugger/base/base.h:61:ORTE_DECLSPEC extern struct MPIR_PROCDESC *MPIR_proctable;
>>>>>>> ./orte/mca/debugger/base/base.h:62:ORTE_DECLSPEC extern int MPIR_proctable_size;
>>>>>>>
>>>>>>> ./orte/mca/debugger/base/debugger_base_open.c:42:struct MPIR_PROCDESC *MPIR_proctable = NULL;
>>>>>>> ./orte/mca/debugger/base/debugger_base_open.c:43:int MPIR_proctable_size = 0;
>>>>>>>
>>>>>>> ./orte/tools/orterun/debuggers.c:142:struct MPIR_PROCDESC *MPIR_proctable = NULL;
>>>>>>> ./orte/tools/orterun/debuggers.c:143:int MPIR_proctable_size = 0;
>>>>>>>
>>>>>>> george.
>>>>>>>
>>>>>>>
>>>>>>>> Attached is a log showing the problem, the only change I made to the source is to add a call to orte_debugger_base_dump() before the return from orte_debugger_base_init_after_spawn(), it looks like this could also have been achieved via a debug setting but I couldn't see how.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> <orterun.c.gz>_______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/