Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] bug in mca framework?
From: Mike Dubman (miked_at_[hidden])
Date: 2013-12-09 08:17:36


Nathan,
Could you please comment on the Igor`s observations?

Thanks

On Wed, Dec 4, 2013 at 4:44 PM, Igor Ivanov <igor.ivanov_at_[hidden]> wrote:

> On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote:
>
>> On Dec 4, 2013, at 2:52 AM, Igor Ivanov <Igor.Ivanov_at_[hidden]> wrote:
>>
>> It is the first mca variable with type as string from btl/openib as
>>> 'device_param_files'. Actually you can disable it and get failure on the
>>> second.
>>>
>>> Description of case we see:
>>> 1. openib mca variables are registered during startup as stage at select
>>> component phase;
>>> 2. but a winner is cm component and openib mca variables are
>>> deregistered as part of mca group;
>>> 3. mca variables are not removed from global mca array but they marked
>>> as invalid and memory for string is freed;
>>> 4. shmem needs openib for yoda and does bml initialization;
>>> 5. openib mca variables are registered againusing light mode as
>>> searching itself in global array and refreshing their fields again;
>>>
>> Can you explain what you mean by step 5? I.e., what does "using light
>> mode" mean? Is the openib component register function invoked again?
>>
> It is correct, it is called twice. "light mode" means that
> mca_base_var_register() does not allocate mca variable object again, it
> seeks this variable in global array and finding it updates fields in
> mca_base_var_t structure (at least mbv_storage).
>
>
>> 6. for unknown reason bml finalization does not clean these vars as it
>>> is done in step 2;
>>> 7. mca_btl_openib.so is unloaded;
>>> 8. opal_finalize() destroys mca variables form global array, observes
>>> openib`s variable, try destroy using non accessed address;
>>>
>>> So a code that is under discussion fixes step 6.
>>>
>> Nathan: it sounds like an MCA var (and entire group) is registered,
>> unregistered, and then registered again. Does the MCA var system get
>> confused here when it tries to unregister the group a 2nd time?
>>
> Probably issue relates incorrect recognition if variable valid/invalid
> during second call of mca_base_var_deregister().
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>