Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] Re: bug in mca framework?
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2014-01-17 00:18:57


I missed the call in the SPML. If that's already there (it would have
caused major problems previously, which is why I assumed it wasn't), then
you're good.

Brian

On 1/16/14 10:12 PM, "Igor Ivanov" <igor.ivanov_at_[hidden]> wrote:

>I have supposed that BML add_procs() is called by PML and I see such
>call in ompi_mpi_init() as ".. ret = MCA_PML_CALL(add_procs(procs,
>nprocs));...". Moreover BML add_procs() is called by SPML (OSHMEM`s PML)
>in oshmem_shmem_init().
>So it looks that all should be correct. Or am I still missing something?
>
>Igor
>
>On 16.01.2014 22:21, Barrett, Brian W wrote:
>> If a process is using the Portals 4 MTL and calls shmem_init, the BTLS
>> will be initialized properly, but as of right now, no one will call
>> add_procs() on the BML (which calls add_procs() on the BTLs). So the
>> first shmem communication will fail, because the proc lookup will fail
>> inside the BTL. If the MPI layer doesn't call add_procs(), someone else
>> has to. In this case, that someone else is the OpenSHMEM layer.
>>
>> Brian
>>
>> On 1/15/14 7:45 AM, "Igor Ivanov" <igor.ivanov_at_[hidden]> wrote:
>>
>>> Brian,
>>>
>>> Sorry for slow reaction.
>>> I am not sure I understand your concern. Could you please make it
>>> clearer and review modified patch (I have figured out issue in my
>>> previous patch as absence of complete btl initialization in case PML
>>> components different from bfo and ob1 needed for OSHMEM.)
>>>
>>> Igor
>>>
>>> On 03.01.2014 00:04, Barrett, Brian W wrote:
>>>> Igor -
>>>>
>>>> Sorry for the slow reply; I was on vacation for the last week and a
>>>> half.
>>>>
>>>> The patch doesn't look quite right to me. If the cm PML is used, the
>>>> spml
>>>> (or something else in the OSHMEM layer) is going to have to call
>>>> add_procs
>>>> on the BML to initialize the procs arrays for the BTLs.
>>>>
>>>> Brian
>>>>
>>>> On 12/23/13 3:49 AM, "Igor Ivanov" <igor.ivanov_at_[hidden]> wrote:
>>>>
>>>>> Brian,
>>>>>
>>>>> Could you look at patch based on your suggestion. It resolves the
>>>>>issue
>>>>> with mca variable.
>>>>>
>>>>> Igor
>>>>>
>>>>> On 18.12.2013 01:48, Barrett, Brian W wrote:
>>>>>> The proposed solution at the bottom is wrong. There aren't two
>>>>>> different
>>>>>> BMLs, there's one, and it lives in OMPI.
>>>>>>
>>>>>> The solution is to open the bml and btls in ompi_mpi_init and not in
>>>>>> the
>>>>>> pmls. I checked, and the bml will deal with add_procs being called
>>>>>> multiple times on the same proc, so just moving the framework open /
>>>>>> init
>>>>>> is sufficient. This will also solve the MTL problem.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> On 12/17/13 8:33 AM, "Joshua Ladd" <joshual_at_[hidden]> wrote:
>>>>>>
>>>>>>> I believe Devendar Bureddy nailed the root cause. I am providing
>>>>>>>his
>>>>>>> excellent analysis below:
>>>>>>>
>>>>>> >From Devendar:
>>>>>>> with curiosity i looked at this issue. here's my 2 cents
>>>>>>> I think issue is because of BTL components is opened&closed
>>>>>>> twice(ompi_init, yoda) which leading to incorrect usage of var
>>>>>>> groups.
>>>>>>> The following sequence of events creating invalid memory
>>>>>>>
>>>>>>> 1) all openib component parameters registered in ompi_mpi_init
>>>>>>> main > start_pes> shmem_init -> oshmem_shmem_init -> ompi_mpi_init
>>>>>>>->
>>>>>>> mca_base_framework_open -> mca_pml_base_open .....
>>>>>>> mca_bml_base_open...
>>>>>>> -> btl_openib_component_register()
>>>>>>>
>>>>>>> * for all string variables it allocated a memory block
>>>>>>> (var->mbv_storage
>>>>>>> = PTR)
>>>>>>>
>>>>>>> At this time a new var group id:114 (of parent group id: 112) is
>>>>>>> created
>>>>>>> for all openib component variables.
>>>>>>>
>>>>>>> 2) This var group is de-registered in ompi_mpi_init. It marks all
>>>>>>> variables as invalid. but, the group&vars is still exist
>>>>>>> main > start_pes> shmem_init -> oshmem_shmem_init ->
>>>>>>> mca_pml_base_select
>>>>>>> -> mca_base_components_close -> ... -> mca_bml_base_close ->
>>>>>>> mca_base_framework_close -> mca_base_var_group_deregister(groupid:
>>>>>>> 114) *
>>>>>>> all string variables memory is deallocated ( set var->mbv_storage =
>>>>>>> NULL;)
>>>>>>>
>>>>>>> 3) because of step 2). btl_openib.so shared lib dlclosed
>>>>>>>
>>>>>>> 4) Now we are reopening openib in yoda and registering the openib
>>>>>>> variables again.
>>>>>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init ->
>>>>>>> mca_base_framework_open -> mca_spml_base_open>
>>>>>>> mca_spml_yoda_component_open-> ..... mca_bml_base_open... ->
>>>>>>> btl_openib_component_register -> register_variables()
>>>>>>>
>>>>>>> * In register_variables(), var_find() finds this variable( from the
>>>>>>> same
>>>>>>> old group: 114) and reset the variables.
>>>>>>> * For string variables, it allocated the buffers again (
>>>>>>> (var->mbv_storage = PTR)
>>>>>>> * note that group:114 is not belongs to yoda component.
>>>>>>>
>>>>>>> 5) In yoda component close, it never finds above group(114) because
>>>>>>> this
>>>>>>> is not belongs to this component. So, do not call
>>>>>>> mca_base_var_group_deregister() again on the var group. string var
>>>>>>> memory
>>>>>>> is not deallocated.
>>>>>>> main > start_pes> shmem_init > oshmem_shmem_init -> _shmem_init ->
>>>>>>> mca_spml_base_select ->..> mca_spml_yoda_component_close ->
>>>>>>> mca_bml_base_close -> mca_base_var_group_find().
>>>>>>>
>>>>>>> 6) because of step 5), the btl_openib.so is dlclosed(). This step
>>>>>>> invalidates, all openib string vars memory ( var->mbv_storage =
>>>>>>>PTR)
>>>>>>> allocated in step 4)
>>>>>>>
>>>>>>> 7) in ompi_mpi_finalize(), it will loop through all vars and
>>>>>>> finalizes
>>>>>>> and deallocate the string var memory (var->mbv_storage = PTR)
>>>>>>> ompi_mpi_finalize >...> mca_base_var_finalize * var->mbv_storage =
>>>>>>> PTR
>>>>>>> is
>>>>>>> invalid at this stage and causing the SEGFAULT.
>>>>>>>
>>>>>>>
>>>>>>> This also explains why Dinar's patch, kostul_fix.patch
>>>>>>>
>>>>>>>
>>>>>>>(http://bgate.mellanox.com/redmine/attachments/1643/kostul_fix.patch
>>>>>>>),
>>>>>>> resolves the issue. His patch prevents you from finding the invalid
>>>>>>> already opened params.
>>>>>>> So, I see in a lot of these registration functions the signature
>>>>>>>has
>>>>>>> an
>>>>>>> entry for the project name, but now, NULL, is always passed. I see
>>>>>>>a
>>>>>>> note
>>>>>>> by Nathan in
>>>>>>>
>>>>>>> ../opal/mca/base/mca_base_var.c +1311
>>>>>>> {
>>>>>>> /* XXX -- component_update -- We will stash the project name in the
>>>>>>> component */
>>>>>>> return mca_base_var_register (NULL, component->mca_type_name,
>>>>>>>
>>>>>>>
>>>>>>> Seems knowing the project name, oshmem, would allow us to
>>>>>>>distinguish
>>>>>>> between the different BMLs.
>>>>>>>
>>>>>>> Nathan, please advise.
>>>>>>>
>>>>>>> Josh
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Nathan
>>>>>>> Hjelm
>>>>>>> Sent: Monday, December 16, 2013 12:44 PM
>>>>>>> To: Open MPI Developers
>>>>>>> Subject: Re: [OMPI devel] bug in mca framework?
>>>>>>>
>>>>>>> On Mon, Dec 16, 2013 at 05:21:05PM +0000, Joshua Ladd wrote:
>>>>>>>> After speaking with Igor Ivanov about this this morning, he
>>>>>>>> summarized
>>>>>>>> his findings as follows:
>>>>>>>>
>>>>>>>> 1. Valgrind comes up clean.
>>>>>>> Thats good to hear but unfortunate since this seems really like a
>>>>>>> stomping-on-memory problem.
>>>>>>>
>>>>>>>> 2. The issue is not reproduced with a static build.
>>>>>>> This is a red-herring. The variable itself contains garbage. The
>>>>>>> mbv_storage pointer looked like it was on the stack, the name was
>>>>>>>not
>>>>>>> valid, etc. Not sure how we got an mca_base_var_t into that state
>>>>>>> since
>>>>>>> the only time we touch anything in them is in
>>>>>>>mca_base_var_finalize.
>>>>>>> That
>>>>>>> functions cleans up all of the state to two calls to it should be
>>>>>>> harmless.
>>>>>>>
>>>>>>>> 3. A bisection study reveals that problems first appear after
>>>>>>>> commit:
>>>>>>>>
>>>>>>>>
>>>>>>>>https://svn.open-mpi.org/trac/ompi/changeset/28800/trunk/opal/mca/b
>>>>>>>>as
>>>>>>>> e
>>>>>>>> /mca_base_var.c
>>>>>>> Possibly also a coincidence. That commit only 1) moves the group
>>>>>>> stuff
>>>>>>> into its own file, and 2) adds the mca_base_pvar interface. Its
>>>>>>> possible
>>>>>>> I messed something up in the rest of the code but unlikely. I will
>>>>>>> take
>>>>>>> another look though.
>>>>>>>
>>>>>>> -Nathan
>>>>>>>
>>>>>>>> Josh
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Jeff
>>>>>>>> Squyres (jsquyres)
>>>>>>>> Sent: Monday, December 16, 2013 12:15 PM
>>>>>>>> To: Open MPI Developers
>>>>>>>> Subject: Re: [OMPI devel] bug in mca framework?
>>>>>>>>
>>>>>>>> It might be worthwhile to run this through valgrind and see if
>>>>>>>> something is being freed incorrectly...?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Dec 16, 2013, at 12:11 PM, Nathan Hjelm <hjelmn_at_[hidden]>
>>>>>>>>wrote:
>>>>>>>>
>>>>>>>>> I took a look at the stacktraces last week and could not identify
>>>>>>>>> where the bug is. I will dig deeper this week and see if I can
>>>>>>>>>come
>>>>>>>> up with the correct fix.
>>>>>>>>> -Nathan
>>>>>>>>>
>>>>>>>>> On Mon, Dec 09, 2013 at 03:17:36PM +0200, Mike Dubman wrote:
>>>>>>>>>> Nathan,
>>>>>>>>>> Could you please comment on the Igor`s observations?
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> On Wed, Dec 4, 2013 at 4:44 PM, Igor Ivanov
>>>>>>>> <igor.ivanov_at_[hidden]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 04.12.2013 17:56, Jeff Squyres (jsquyres) wrote:
>>>>>>>>>>
>>>>>>>>>> On Dec 4, 2013, at 2:52 AM, Igor Ivanov
>>>>>>>> <Igor.Ivanov_at_[hidden]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> It is the first mca variable with type as string from
>>>>>>>> btl/openib as
>>>>>>>>>> 'device_param_files'. Actually you can disable it and
>>>>>>>>>> get
>>>>>>>> failure on
>>>>>>>>>> the second.
>>>>>>>>>>
>>>>>>>>>> Description of case we see:
>>>>>>>>>> 1. openib mca variables are registered during
>>>>>>>>>>startup as
>>>>>>>> stage at
>>>>>>>>>> select component phase;
>>>>>>>>>> 2. but a winner is cm component and openib mca
>>>>>>>>>>variables
>>>>>>>>>> are
>>>>>>>>>> deregistered as part of mca group;
>>>>>>>>>> 3. mca variables are not removed from global mca
>>>>>>>>>>array
>>>>>>>>>> but
>>>>>>>> they
>>>>>>>>>> marked as invalid and memory for string is freed;
>>>>>>>>>> 4. shmem needs openib for yoda and does bml
>>>>>>>>>> initialization;
>>>>>>>>>> 5. openib mca variables are registered againusing
>>>>>>>>>>light
>>>>>>>>>> mode
>>>>>>>> as
>>>>>>>>>> searching itself in global array and refreshing their
>>>>>>>>>> fields again;
>>>>>>>>>>
>>>>>>>>>> Can you explain what you mean by step 5? I.e., what
>>>>>>>>>>does
>>>>>>>> "using light
>>>>>>>>>> mode" mean? Is the openib component register function
>>>>>>>>>> invoked
>>>>>>>> again?
>>>>>>>>>> It is correct, it is called twice. "light mode" means
>>>>>>>>>>that
>>>>>>>>>> mca_base_var_register() does not allocate mca variable
>>>>>>>>>> object
>>>>>>>> again, it
>>>>>>>>>> seeks this variable in global array and finding it
>>>>>>>>>>updates
>>>>>>>> fields in
>>>>>>>>>> mca_base_var_t structure (at least mbv_storage).
>>>>>>>>>>
>>>>>>>>>> 6. for unknown reason bml finalization does not clean
>>>>>>>>>> these
>>>>>>>> vars as
>>>>>>>>>> it is done in step 2;
>>>>>>>>>> 7. mca_btl_openib.so is unloaded;
>>>>>>>>>> 8. opal_finalize() destroys mca variables form global
>>>>>>>>>> array,
>>>>>>>>>> observes openib`s variable, try destroy using non
>>>>>>>>>> accessed
>>>>>>>>>> address;
>>>>>>>>>>
>>>>>>>>>> So a code that is under discussion fixes step 6.
>>>>>>>>>>
>>>>>>>>>> Nathan: it sounds like an MCA var (and entire group) is
>>>>>>>> registered,
>>>>>>>>>> unregistered, and then registered again. Does the MCA
>>>>>>>>>>var
>>>>>>>> system get
>>>>>>>>>> confused here when it tries to unregister the group a
>>>>>>>>>>2nd
>>>>>>>>>> time?
>>>>>>>>>>
>>>>>>>>>> Probably issue relates incorrect recognition if variable
>>>>>>>> valid/invalid
>>>>>>>>>> during second call of mca_base_var_deregister().
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> --
>>>>>>>> Jeff Squyres
>>>>>>>> jsquyres_at_[hidden]
>>>>>>>> For corporate legal information go to:
>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>> --
>>>>>> Brian W. Barrett
>>>>>> Scalable System Software Group
>>>>>> Sandia National Laboratories
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>> --
>>>> Brian W. Barrett
>>>> Scalable System Software Group
>>>> Sandia National Laboratories
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>
>> --
>> Brian W. Barrett
>> Scalable System Software Group
>> Sandia National Laboratories
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories