Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] Re: bug in mca framework?
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2014-01-16 13:21:06


If a process is using the Portals 4 MTL and calls shmem_init, the BTLS
will be initialized properly, but as of right now, no one will call
add_procs() on the BML (which calls add_procs() on the BTLs). So the
first shmem communication will fail, because the proc lookup will fail
inside the BTL. If the MPI layer doesn't call add_procs(), someone else
has to. In this case, that someone else is the OpenSHMEM layer.

Brian

On 1/15/14 7:45 AM, "Igor Ivanov" <igor.ivanov_at_[hidden]> wrote:

>Brian,
>
>Sorry for slow reaction.
>I am not sure I understand your concern. Could you please make it
>clearer and review modified patch (I have figured out issue in my
>previous patch as absence of complete btl initialization in case PML
>components different from bfo and ob1 needed for OSHMEM.)
>
>Igor
>
>On 03.01.2014 00:04, Barrett, Brian W wrote:
>> Igor -
>>
>> Sorry for the slow reply; I was on vacation for the last week and a
>>half.
>>
>> The patch doesn't look quite right to me. If the cm PML is used, the
>>spml
>> (or something else in the OSHMEM layer) is going to have to call
>>add_procs
>> on the BML to initialize the procs arrays for the BTLs.
>>
>> Brian
>>
>> On 12/23/13 3:49 AM, "Igor Ivanov" <igor.ivanov_at_[hidden]> wrote:
>>
>>> Brian,
>>>
>>> Could you look at patch based on your suggestion. It resolves the issue
>>> with mca variable.
>>>
>>> Igor
>>>
>>> On 18.12.2013 01:48, Barrett, Brian W wrote:
>>>> The proposed solution at the bottom is wrong. There aren't two
>>>> different
>>>> BMLs, there's one, and it lives in OMPI.
>>>>
>>>> The solution is to open the bml and btls in ompi_mpi_init and not in
>>>>the
>>>> pmls. I checked, and the bml will deal with add_procs being called
>>>> multiple times on the same proc, so just moving the framework open /
>>>> init
>>>> is sufficient. This will also solve the MTL problem.
>>>>
>>>> Brian
>>>>
>>>> On 12/17/13 8:33 AM, "Joshua Ladd" <joshual_at_[hidden]> wrote:
>>>>
>>>>> I believe Devendar Bureddy nailed the root cause. I am providing his
>>>>> excellent analysis below:
>>>>>
>>>> >From Devendar:
>>>>> with curiosity i looked at this issue. here's my 2 cents
>>>>> I think issue is because of BTL components is opened&closed
>>>>> twice(ompi_init, yoda) which leading to incorrect usage of var
>>>>>groups.
>>>>> The following sequence of events creating invalid memory
>>>>>
>>>>> 1) all openib component parameters registered in ompi_mpi_init
>>>>> main > start_pes> shmem_init -> oshmem_shmem_init -> ompi_mpi_init ->
>>>>> mca_base_framework_open -> mca_pml_base_open .....
>>>>>mca_bml_base_open...
>>>>> -> btl_openib_component_register()
>>>>>
>>>>> * for all string variables it allocated a memory block
>>>>> (var->mbv_storage
>>>>> = PTR)
>>>>>
>>>>> At this time a new var group id:114 (of parent group id: 112) is
>>>>> created
>>>>> for all openib component variables.
>>>>>
>>>>> 2) This var group is de-registered in ompi_mpi_init. It marks all
>>>>> variables as invalid. but, the group&vars is still exist
>>>>> main > start_pes> shmem_init -> oshmem_shmem_init ->
>>>>> mca_pml_base_select
>>>>> -> mca_base_components_close -> ... -> mca_bml_base_close ->
>>>>> mca_base_framework_close -> mca_base_var_group_deregister(groupid:
>>>>> 114) *
>>>>> all string variables memory is deallocated ( set var->mbv_storage =
>>>>> NULL;)
>>>>>
>>>>> 3) because of step 2). btl_openib.so shared lib dlclosed
>>>>>
>>>>> 4) Now we are reopening openib in yoda and registering the openib
>>>>> variables again.
>>>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init ->
>>>>> mca_base_framework_open -> mca_spml_base_open>
>>>>> mca_spml_yoda_component_open-> ..... mca_bml_base_open... ->
>>>>> btl_openib_component_register -> register_variables()
>>>>>
>>>>> * In register_variables(), var_find() finds this variable( from the
>>>>> same
>>>>> old group: 114) and reset the variables.
>>>>> * For string variables, it allocated the buffers again (
>>>>> (var->mbv_storage = PTR)
>>>>> * note that group:114 is not belongs to yoda component.
>>>>>
>>>>> 5) In yoda component close, it never finds above group(114) because
>>>>> this
>>>>> is not belongs to this component. So, do not call
>>>>> mca_base_var_group_deregister() again on the var group. string var
>>>>> memory
>>>>> is not deallocated.
>>>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init ->
>>>>> mca_spml_base_select ->..> mca_spml_yoda_component_close ->
>>>>> mca_bml_base_close -> mca_base_var_group_find().
>>>>>
>>>>> 6) because of step 5), the btl_openib.so is dlclosed(). This step
>>>>> invalidates, all openib string vars memory ( var->mbv_storage = PTR)
>>>>> allocated in step 4)
>>>>>
>>>>> 7) in ompi_mpi_finalize(), it will loop through all vars and
>>>>>finalizes
>>>>> and deallocate the string var memory (var->mbv_storage = PTR)
>>>>> ompi_mpi_finalize >...> mca_base_var_finalize * var->mbv_storage =
>>>>>PTR
>>>>> is
>>>>> invalid at this stage and causing the SEGFAULT.
>>>>>
>>>>>
>>>>> This also explains why Dinar's patch, kostul_fix.patch
>>>>>
>>>>>(http://bgate.mellanox.com/redmine/attachments/1643/kostul_fix.patch),
>>>>> resolves the issue. His patch prevents you from finding the invalid
>>>>> already opened params.
>>>>> So, I see in a lot of these registration functions the signature has
>>>>>an
>>>>> entry for the project name, but now, NULL, is always passed. I see a
>>>>> note
>>>>> by Nathan in
>>>>>
>>>>> ../opal/mca/base/mca_base_var.c +1311
>>>>> {
>>>>> /* XXX -- component_update -- We will stash the project name in the
>>>>> component */
>>>>> return mca_base_var_register (NULL, component->mca_type_name,
>>>>>
>>>>>
>>>>> Seems knowing the project name, oshmem, would allow us to distinguish
>>>>> between the different BMLs.
>>>>>
>>>>> Nathan, please advise.
>>>>>
>>>>> Josh
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Nathan
>>>>> Hjelm
>>>>> Sent: Monday, December 16, 2013 12:44 PM
>>>>> To: Open MPI Developers
>>>>> Subject: Re: [OMPI devel] bug in mca framework?
>>>>>
>>>>> On Mon, Dec 16, 2013 at 05:21:05PM +0000, Joshua Ladd wrote:
>>>>>> After speaking with Igor Ivanov about this this morning, he
>>>>>>summarized
>>>>>> his findings as follows:
>>>>>>
>>>>>> 1. Valgrind comes up clean.
>>>>> Thats good to hear but unfortunate since this seems really like a
>>>>> stomping-on-memory problem.
>>>>>
>>>>>> 2. The issue is not reproduced with a static build.
>>>>> This is a red-herring. The variable itself contains garbage. The
>>>>> mbv_storage pointer looked like it was on the stack, the name was not
>>>>> valid, etc. Not sure how we got an mca_base_var_t into that state
>>>>>since
>>>>> the only time we touch anything in them is in mca_base_var_finalize.
>>>>> That
>>>>> functions cleans up all of the state to two calls to it should be
>>>>> harmless.
>>>>>
>>>>>> 3. A bisection study reveals that problems first appear after
>>>>>>commit:
>>>>>>
>>>>>>https://svn.open-mpi.org/trac/ompi/changeset/28800/trunk/opal/mca/bas
>>>>>>e
>>>>>> /mca_base_var.c
>>>>> Possibly also a coincidence. That commit only 1) moves the group
>>>>>stuff
>>>>> into its own file, and 2) adds the mca_base_pvar interface. Its
>>>>> possible
>>>>> I messed something up in the rest of the code but unlikely. I will
>>>>>take
>>>>> another look though.
>>>>>
>>>>> -Nathan
>>>>>
>>>>>> Josh
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Jeff
>>>>>> Squyres (jsquyres)
>>>>>> Sent: Monday, December 16, 2013 12:15 PM
>>>>>> To: Open MPI Developers
>>>>>> Subject: Re: [OMPI devel] bug in mca framework?
>>>>>>
>>>>>> It might be worthwhile to run this through valgrind and see if
>>>>>> something is being freed incorrectly...?
>>>>>>
>>>>>>
>>>>>> On Dec 16, 2013, at 12:11 PM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:
>>>>>>
>>>>>>> I took a look at the stacktraces last week and could not identify
>>>>>>> where the bug is. I will dig deeper this week and see if I can come
>>>>>> up with the correct fix.
>>>>>>> -Nathan
>>>>>>>
>>>>>>> On Mon, Dec 09, 2013 at 03:17:36PM +0200, Mike Dubman wrote:
>>>>>>>> Nathan,
>>>>>>>> Could you please comment on the Igor`s observations?
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Wed, Dec 4, 2013 at 4:44 PM, Igor Ivanov
>>>>>> <igor.ivanov_at_[hidden]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote:
>>>>>>>>
>>>>>>>> On Dec 4, 2013, at 2:52 AM, Igor Ivanov
>>>>>> <Igor.Ivanov_at_[hidden]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> It is the first mca variable with type as string from
>>>>>> btl/openib as
>>>>>>>> 'device_param_files'. Actually you can disable it and
>>>>>>>>get
>>>>>> failure on
>>>>>>>> the second.
>>>>>>>>
>>>>>>>> Description of case we see:
>>>>>>>> 1. openib mca variables are registered during startup as
>>>>>> stage at
>>>>>>>> select component phase;
>>>>>>>> 2. but a winner is cm component and openib mca variables
>>>>>>>> are
>>>>>>>> deregistered as part of mca group;
>>>>>>>> 3. mca variables are not removed from global mca array
>>>>>>>>but
>>>>>> they
>>>>>>>> marked as invalid and memory for string is freed;
>>>>>>>> 4. shmem needs openib for yoda and does bml
>>>>>>>>initialization;
>>>>>>>> 5. openib mca variables are registered againusing light
>>>>>>>> mode
>>>>>> as
>>>>>>>> searching itself in global array and refreshing their
>>>>>>>> fields again;
>>>>>>>>
>>>>>>>> Can you explain what you mean by step 5? I.e., what does
>>>>>> "using light
>>>>>>>> mode" mean? Is the openib component register function
>>>>>>>> invoked
>>>>>> again?
>>>>>>>> It is correct, it is called twice. "light mode" means that
>>>>>>>> mca_base_var_register() does not allocate mca variable
>>>>>>>>object
>>>>>> again, it
>>>>>>>> seeks this variable in global array and finding it updates
>>>>>> fields in
>>>>>>>> mca_base_var_t structure (at least mbv_storage).
>>>>>>>>
>>>>>>>> 6. for unknown reason bml finalization does not clean
>>>>>>>>these
>>>>>> vars as
>>>>>>>> it is done in step 2;
>>>>>>>> 7. mca_btl_openib.so is unloaded;
>>>>>>>> 8. opal_finalize() destroys mca variables form global
>>>>>>>> array,
>>>>>>>> observes openib`s variable, try destroy using non
>>>>>>>>accessed
>>>>>>>> address;
>>>>>>>>
>>>>>>>> So a code that is under discussion fixes step 6.
>>>>>>>>
>>>>>>>> Nathan: it sounds like an MCA var (and entire group) is
>>>>>> registered,
>>>>>>>> unregistered, and then registered again. Does the MCA var
>>>>>> system get
>>>>>>>> confused here when it tries to unregister the group a 2nd
>>>>>>>> time?
>>>>>>>>
>>>>>>>> Probably issue relates incorrect recognition if variable
>>>>>> valid/invalid
>>>>>>>> during second call of mca_base_var_deregister().
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> jsquyres_at_[hidden]
>>>>>> For corporate legal information go to:
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>> --
>>>> Brian W. Barrett
>>>> Scalable System Software Group
>>>> Sandia National Laboratories
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>
>> --
>> Brian W. Barrett
>> Scalable System Software Group
>> Sandia National Laboratories
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>

--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories