Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] Re: bug in mca framework?
From: Igor Ivanov (igor.ivanov_at_[hidden])
Date: 2014-01-17 00:12:35


I have supposed that BML add_procs() is called by PML and I see such
call in ompi_mpi_init() as ".. ret = MCA_PML_CALL(add_procs(procs,
nprocs));...". Moreover BML add_procs() is called by SPML (OSHMEM`s PML)
in oshmem_shmem_init().
So it looks that all should be correct. Or am I still missing something?

Igor

On 16.01.2014 22:21, Barrett, Brian W wrote:
> If a process is using the Portals 4 MTL and calls shmem_init, the BTLS
> will be initialized properly, but as of right now, no one will call
> add_procs() on the BML (which calls add_procs() on the BTLs). So the
> first shmem communication will fail, because the proc lookup will fail
> inside the BTL. If the MPI layer doesn't call add_procs(), someone else
> has to. In this case, that someone else is the OpenSHMEM layer.
>
> Brian
>
> On 1/15/14 7:45 AM, "Igor Ivanov" <igor.ivanov_at_[hidden]> wrote:
>
>> Brian,
>>
>> Sorry for slow reaction.
>> I am not sure I understand your concern. Could you please make it
>> clearer and review modified patch (I have figured out issue in my
>> previous patch as absence of complete btl initialization in case PML
>> components different from bfo and ob1 needed for OSHMEM.)
>>
>> Igor
>>
>> On 03.01.2014 00:04, Barrett, Brian W wrote:
>>> Igor -
>>>
>>> Sorry for the slow reply; I was on vacation for the last week and a
>>> half.
>>>
>>> The patch doesn't look quite right to me. If the cm PML is used, the
>>> spml
>>> (or something else in the OSHMEM layer) is going to have to call
>>> add_procs
>>> on the BML to initialize the procs arrays for the BTLs.
>>>
>>> Brian
>>>
>>> On 12/23/13 3:49 AM, "Igor Ivanov" <igor.ivanov_at_[hidden]> wrote:
>>>
>>>> Brian,
>>>>
>>>> Could you look at patch based on your suggestion. It resolves the issue
>>>> with mca variable.
>>>>
>>>> Igor
>>>>
>>>> On 18.12.2013 01:48, Barrett, Brian W wrote:
>>>>> The proposed solution at the bottom is wrong. There aren't two
>>>>> different
>>>>> BMLs, there's one, and it lives in OMPI.
>>>>>
>>>>> The solution is to open the bml and btls in ompi_mpi_init and not in
>>>>> the
>>>>> pmls. I checked, and the bml will deal with add_procs being called
>>>>> multiple times on the same proc, so just moving the framework open /
>>>>> init
>>>>> is sufficient. This will also solve the MTL problem.
>>>>>
>>>>> Brian
>>>>>
>>>>> On 12/17/13 8:33 AM, "Joshua Ladd" <joshual_at_[hidden]> wrote:
>>>>>
>>>>>> I believe Devendar Bureddy nailed the root cause. I am providing his
>>>>>> excellent analysis below:
>>>>>>
>>>>> >From Devendar:
>>>>>> with curiosity i looked at this issue. here's my 2 cents
>>>>>> I think issue is because of BTL components is opened&closed
>>>>>> twice(ompi_init, yoda) which leading to incorrect usage of var
>>>>>> groups.
>>>>>> The following sequence of events creating invalid memory
>>>>>>
>>>>>> 1) all openib component parameters registered in ompi_mpi_init
>>>>>> main > start_pes> shmem_init -> oshmem_shmem_init -> ompi_mpi_init ->
>>>>>> mca_base_framework_open -> mca_pml_base_open .....
>>>>>> mca_bml_base_open...
>>>>>> -> btl_openib_component_register()
>>>>>>
>>>>>> * for all string variables it allocated a memory block
>>>>>> (var->mbv_storage
>>>>>> = PTR)
>>>>>>
>>>>>> At this time a new var group id:114 (of parent group id: 112) is
>>>>>> created
>>>>>> for all openib component variables.
>>>>>>
>>>>>> 2) This var group is de-registered in ompi_mpi_init. It marks all
>>>>>> variables as invalid. but, the group&vars is still exist
>>>>>> main > start_pes> shmem_init -> oshmem_shmem_init ->
>>>>>> mca_pml_base_select
>>>>>> -> mca_base_components_close -> ... -> mca_bml_base_close ->
>>>>>> mca_base_framework_close -> mca_base_var_group_deregister(groupid:
>>>>>> 114) *
>>>>>> all string variables memory is deallocated ( set var->mbv_storage =
>>>>>> NULL;)
>>>>>>
>>>>>> 3) because of step 2). btl_openib.so shared lib dlclosed
>>>>>>
>>>>>> 4) Now we are reopening openib in yoda and registering the openib
>>>>>> variables again.
>>>>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init ->
>>>>>> mca_base_framework_open -> mca_spml_base_open>
>>>>>> mca_spml_yoda_component_open-> ..... mca_bml_base_open... ->
>>>>>> btl_openib_component_register -> register_variables()
>>>>>>
>>>>>> * In register_variables(), var_find() finds this variable( from the
>>>>>> same
>>>>>> old group: 114) and reset the variables.
>>>>>> * For string variables, it allocated the buffers again (
>>>>>> (var->mbv_storage = PTR)
>>>>>> * note that group:114 is not belongs to yoda component.
>>>>>>
>>>>>> 5) In yoda component close, it never finds above group(114) because
>>>>>> this
>>>>>> is not belongs to this component. So, do not call
>>>>>> mca_base_var_group_deregister() again on the var group. string var
>>>>>> memory
>>>>>> is not deallocated.
>>>>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init ->
>>>>>> mca_spml_base_select ->..> mca_spml_yoda_component_close ->
>>>>>> mca_bml_base_close -> mca_base_var_group_find().
>>>>>>
>>>>>> 6) because of step 5), the btl_openib.so is dlclosed(). This step
>>>>>> invalidates, all openib string vars memory ( var->mbv_storage = PTR)
>>>>>> allocated in step 4)
>>>>>>
>>>>>> 7) in ompi_mpi_finalize(), it will loop through all vars and
>>>>>> finalizes
>>>>>> and deallocate the string var memory (var->mbv_storage = PTR)
>>>>>> ompi_mpi_finalize >...> mca_base_var_finalize * var->mbv_storage =
>>>>>> PTR
>>>>>> is
>>>>>> invalid at this stage and causing the SEGFAULT.
>>>>>>
>>>>>>
>>>>>> This also explains why Dinar's patch, kostul_fix.patch
>>>>>>
>>>>>> (http://bgate.mellanox.com/redmine/attachments/1643/kostul_fix.patch),
>>>>>> resolves the issue. His patch prevents you from finding the invalid
>>>>>> already opened params.
>>>>>> So, I see in a lot of these registration functions the signature has
>>>>>> an
>>>>>> entry for the project name, but now, NULL, is always passed. I see a
>>>>>> note
>>>>>> by Nathan in
>>>>>>
>>>>>> ../opal/mca/base/mca_base_var.c +1311
>>>>>> {
>>>>>> /* XXX -- component_update -- We will stash the project name in the
>>>>>> component */
>>>>>> return mca_base_var_register (NULL, component->mca_type_name,
>>>>>>
>>>>>>
>>>>>> Seems knowing the project name, oshmem, would allow us to distinguish
>>>>>> between the different BMLs.
>>>>>>
>>>>>> Nathan, please advise.
>>>>>>
>>>>>> Josh
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Nathan
>>>>>> Hjelm
>>>>>> Sent: Monday, December 16, 2013 12:44 PM
>>>>>> To: Open MPI Developers
>>>>>> Subject: Re: [OMPI devel] bug in mca framework?
>>>>>>
>>>>>> On Mon, Dec 16, 2013 at 05:21:05PM +0000, Joshua Ladd wrote:
>>>>>>> After speaking with Igor Ivanov about this this morning, he
>>>>>>> summarized
>>>>>>> his findings as follows:
>>>>>>>
>>>>>>> 1. Valgrind comes up clean.
>>>>>> Thats good to hear but unfortunate since this seems really like a
>>>>>> stomping-on-memory problem.
>>>>>>
>>>>>>> 2. The issue is not reproduced with a static build.
>>>>>> This is a red-herring. The variable itself contains garbage. The
>>>>>> mbv_storage pointer looked like it was on the stack, the name was not
>>>>>> valid, etc. Not sure how we got an mca_base_var_t into that state
>>>>>> since
>>>>>> the only time we touch anything in them is in mca_base_var_finalize.
>>>>>> That
>>>>>> functions cleans up all of the state to two calls to it should be
>>>>>> harmless.
>>>>>>
>>>>>>> 3. A bisection study reveals that problems first appear after
>>>>>>> commit:
>>>>>>>
>>>>>>> https://svn.open-mpi.org/trac/ompi/changeset/28800/trunk/opal/mca/bas
>>>>>>> e
>>>>>>> /mca_base_var.c
>>>>>> Possibly also a coincidence. That commit only 1) moves the group
>>>>>> stuff
>>>>>> into its own file, and 2) adds the mca_base_pvar interface. Its
>>>>>> possible
>>>>>> I messed something up in the rest of the code but unlikely. I will
>>>>>> take
>>>>>> another look though.
>>>>>>
>>>>>> -Nathan
>>>>>>
>>>>>>> Josh
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Jeff
>>>>>>> Squyres (jsquyres)
>>>>>>> Sent: Monday, December 16, 2013 12:15 PM
>>>>>>> To: Open MPI Developers
>>>>>>> Subject: Re: [OMPI devel] bug in mca framework?
>>>>>>>
>>>>>>> It might be worthwhile to run this through valgrind and see if
>>>>>>> something is being freed incorrectly...?
>>>>>>>
>>>>>>>
>>>>>>> On Dec 16, 2013, at 12:11 PM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:
>>>>>>>
>>>>>>>> I took a look at the stacktraces last week and could not identify
>>>>>>>> where the bug is. I will dig deeper this week and see if I can come
>>>>>>> up with the correct fix.
>>>>>>>> -Nathan
>>>>>>>>
>>>>>>>> On Mon, Dec 09, 2013 at 03:17:36PM +0200, Mike Dubman wrote:
>>>>>>>>> Nathan,
>>>>>>>>> Could you please comment on the Igor`s observations?
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On Wed, Dec 4, 2013 at 4:44 PM, Igor Ivanov
>>>>>>> <igor.ivanov_at_[hidden]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote:
>>>>>>>>>
>>>>>>>>> On Dec 4, 2013, at 2:52 AM, Igor Ivanov
>>>>>>> <Igor.Ivanov_at_[hidden]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> It is the first mca variable with type as string from
>>>>>>> btl/openib as
>>>>>>>>> 'device_param_files'. Actually you can disable it and
>>>>>>>>> get
>>>>>>> failure on
>>>>>>>>> the second.
>>>>>>>>>
>>>>>>>>> Description of case we see:
>>>>>>>>> 1. openib mca variables are registered during startup as
>>>>>>> stage at
>>>>>>>>> select component phase;
>>>>>>>>> 2. but a winner is cm component and openib mca variables
>>>>>>>>> are
>>>>>>>>> deregistered as part of mca group;
>>>>>>>>> 3. mca variables are not removed from global mca array
>>>>>>>>> but
>>>>>>> they
>>>>>>>>> marked as invalid and memory for string is freed;
>>>>>>>>> 4. shmem needs openib for yoda and does bml
>>>>>>>>> initialization;
>>>>>>>>> 5. openib mca variables are registered againusing light
>>>>>>>>> mode
>>>>>>> as
>>>>>>>>> searching itself in global array and refreshing their
>>>>>>>>> fields again;
>>>>>>>>>
>>>>>>>>> Can you explain what you mean by step 5? I.e., what does
>>>>>>> "using light
>>>>>>>>> mode" mean? Is the openib component register function
>>>>>>>>> invoked
>>>>>>> again?
>>>>>>>>> It is correct, it is called twice. "light mode" means that
>>>>>>>>> mca_base_var_register() does not allocate mca variable
>>>>>>>>> object
>>>>>>> again, it
>>>>>>>>> seeks this variable in global array and finding it updates
>>>>>>> fields in
>>>>>>>>> mca_base_var_t structure (at least mbv_storage).
>>>>>>>>>
>>>>>>>>> 6. for unknown reason bml finalization does not clean
>>>>>>>>> these
>>>>>>> vars as
>>>>>>>>> it is done in step 2;
>>>>>>>>> 7. mca_btl_openib.so is unloaded;
>>>>>>>>> 8. opal_finalize() destroys mca variables form global
>>>>>>>>> array,
>>>>>>>>> observes openib`s variable, try destroy using non
>>>>>>>>> accessed
>>>>>>>>> address;
>>>>>>>>>
>>>>>>>>> So a code that is under discussion fixes step 6.
>>>>>>>>>
>>>>>>>>> Nathan: it sounds like an MCA var (and entire group) is
>>>>>>> registered,
>>>>>>>>> unregistered, and then registered again. Does the MCA var
>>>>>>> system get
>>>>>>>>> confused here when it tries to unregister the group a 2nd
>>>>>>>>> time?
>>>>>>>>>
>>>>>>>>> Probably issue relates incorrect recognition if variable
>>>>>>> valid/invalid
>>>>>>>>> during second call of mca_base_var_deregister().
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> --
>>>>>>> Jeff Squyres
>>>>>>> jsquyres_at_[hidden]
>>>>>>> For corporate legal information go to:
>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>> --
>>>>> Brian W. Barrett
>>>>> Scalable System Software Group
>>>>> Sandia National Laboratories
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>> --
>>> Brian W. Barrett
>>> Scalable System Software Group
>>> Sandia National Laboratories
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>
> --
> Brian W. Barrett
> Scalable System Software Group
> Sandia National Laboratories
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>