On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote:It is correct, it is called twice. "light mode" means that mca_base_var_register() does not allocate mca variable object again, it seeks this variable in global array and finding it updates fields in mca_base_var_t structure (at least mbv_storage).
On Dec 4, 2013, at 2:52 AM, Igor Ivanov <Igor.Ivanov@itseez.com> wrote:
It is the first mca variable with type as string from btl/openib as 'device_param_files'. Actually you can disable it and get failure on the second.Can you explain what you mean by step 5? I.e., what does "using light mode" mean? Is the openib component register function invoked again?
Description of case we see:
1. openib mca variables are registered during startup as stage at select component phase;
2. but a winner is cm component and openib mca variables are deregistered as part of mca group;
3. mca variables are not removed from global mca array but they marked as invalid and memory for string is freed;
4. shmem needs openib for yoda and does bml initialization;
5. openib mca variables are registered againusing light mode as searching itself in global array and refreshing their fields again;
Probably issue relates incorrect recognition if variable valid/invalid during second call of mca_base_var_deregister().
6. for unknown reason bml finalization does not clean these vars as it is done in step 2;Nathan: it sounds like an MCA var (and entire group) is registered, unregistered, and then registered again. Does the MCA var system get confused here when it tries to unregister the group a 2nd time?
7. mca_btl_openib.so is unloaded;
8. opal_finalize() destroys mca variables form global array, observes openib`s variable, try destroy using non accessed address;
So a code that is under discussion fixes step 6.
devel mailing list