Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] mca_base_select()
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2008-05-09 11:35:11


Ok I think I understand the problem a bit better now. I attached a
patch that should fix this, but I want you to check it out before I
commit just to make sure.

If you specify '-mca filter xml' on the command line then only the
'xml' component should be opened by mca_base_open. The problem was
that the selection logic used -1 as the lowest acceptable priority,
which conflicted with the set priority of the 'xml' component. This
patch sets this value to INT32_MIN which should be well below any
negative priority that a component would set for itself.

Let me know if this works for you and I'll commit it.

Cheers,
Josh


On May 9, 2008, at 11:14 AM, Ralph Castain wrote:

> Sure - take a look at the hg repository Jeff and I are working on:
>
> http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/channel
>
> Te opal/mca/filter framework illustrates the problem. I have one
> component
> in there right now, with a default module defined in the base. That
> component must only be selected if the user calls it. With the current
> select logic, I can't do this - if the priority is >=0, then it
> always is
> automatically selected. Priority < 0, never selectable even if
> specified.
>
> Thanks
> Ralph
>
>
>
> On 5/9/08 8:52 AM, "Josh Hursey" <jjhursey_at_[hidden]> wrote:
>
>> Ralph,
>>
>> Can you give me an example of a component that I can look at? It will
>> allow me to test the fix before committing, and to better understand
>> the problem.
>>
>> -- Josh
>>
>> On May 9, 2008, at 10:41 AM, Ralph Castain wrote:
>>
>>> I just hit a problem with this logic - should be a minor change.
>>>
>>> We have several frameworks where we have components that are only
>>> allowed be
>>> selected if the user specifically requests them by stating -mca foo
>>> bar.
>>> Because it is possible for there to be no other components that want
>>> to be
>>> selected, and because it is permissible for no components to be
>>> selected for
>>> that framework, we set bar's priority to be -1.
>>>
>>> The new select logic will not allow a negative priority to be
>>> selected, even
>>> if the user specifically requested that component.
>>>
>>> If we set the priority to be 0, then the system will allow the
>>> component to
>>> be automatically selected. This is not allowed as it can lead to bad
>>> behavior.
>>>
>>> So what we need the select system to do is say "if someone
>>> specified a
>>> specific component, don't worry about the returned priority - just
>>> use it"
>>>
>>> Josh: could you please modify this?
>>>
>>> Thanks!
>>> Ralph
>>>
>>>
>>>
>>> On 5/8/08 7:04 PM, "Pak Lui" <Pak.Lui_at_[hidden]> wrote:
>>>
>>>> Thanks very much Josh! Will try it out soon.
>>>>
>>>> Josh Hursey wrote:
>>>>> Sorry about that. I didn't test that type of option. It should be
>>>>> working in r18418. Let me know if you see any more issues.
>>>>>
>>>>> -- Josh
>>>>>
>>>>> On May 8, 2008, at 6:04 PM, Pak Lui wrote:
>>>>>
>>>>>> I think I have a problem but I am not sure. I used to be able to
>>>>>> use the
>>>>>> circumflex (^) to switch between the gridengine launcher and the
>>>>>> ssh
>>>>>> launchers by doing something like this, e.g. -mca plm
>>>>>> ^gridengine, to
>>>>>> exclude some of the components plm (and also in ras). It doesn't
>>>>>> seem
>>>>>> like the 'negate' is in mca_base_component anymore. I guess I
>>>>>> just have
>>>>>> to spell out which component I want explicitly...
>>>>>>
>>>>>> Josh Hursey wrote:
>>>>>>> This has been committed in r18381
>>>>>>>
>>>>>>> Please let me know if you have any problems with this commit.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Josh
>>>>>>>
>>>>>>> On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
>>>>>>>
>>>>>>>> Awesome.
>>>>>>>>
>>>>>>>> The branch is updated to the latest trunk head. I encourage
>>>>>>>> folks to
>>>>>>>> check out this repository and make sure that it builds on their
>>>>>>>> system. A normal build of the branch should be enough to find
>>>>>>>> out if
>>>>>>>> there are any cut-n-paste problems (though I tried to be
>>>>>>>> careful,
>>>>>>>> mistakes do happen).
>>>>>>>>
>>>>>>>> I haven't heard any problems so this is looking like it will
>>>>>>>> come in
>>>>>>>> tomorrow after the teleconf. I'll ask again there to see if
>>>>>>>> there are
>>>>>>>> any voices of concern.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Josh
>>>>>>>>
>>>>>>>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
>>>>>>>>
>>>>>>>>> This all sounds good to me!
>>>>>>>>>
>>>>>>>>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
>>>>>>>>>
>>>>>>>>>> What: Add mca_base_select() and adjust frameworks &
>>>>>>>>>> components to
>>>>>>>>>> use
>>>>>>>>>> it.
>>>>>>>>>> Why: Consolidation of code for general goodness.
>>>>>>>>>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-
>>>>>>>>>> play
>>>>>>>>>> When: Code ready now. Documentation ready soon.
>>>>>>>>>> Timeout: May 6, 2008 (After teleconf) [1 week]
>>>>>>>>>>
>>>>>>>>>> Discussion:
>>>>>>>>>> -----------
>>>>>>>>>> For a number of years a few developers have been talking
>>>>>>>>>> about
>>>>>>>>>> creating a MCA base component selection function. For various
>>>>>>>>>> reasons
>>>>>>>>>> this was never implemented. Recently I decided to give it a
>>>>>>>>>> try.
>>>>>>>>>>
>>>>>>>>>> A base select function will allow Open MPI to provide
>>>>>>>>>> completely
>>>>>>>>>> consistent selection behavior for many of its frameworks (18
>>>>>>>>>> of 31
>>>>>>>>>> to
>>>>>>>>>> be exact at the moment). The primary goal of this work is to
>>>>>>>>>> improving
>>>>>>>>>> code maintainability through code reuse. Other benefits also
>>>>>>>>>> result
>>>>>>>>>> such as a slightly smaller memory footprint.
>>>>>>>>>>
>>>>>>>>>> The mca_base_select() function represented the most commonly
>>>>>>>>>> used
>>>>>>>>>> logic for component selection: Select the one component with
>>>>>>>>>> the
>>>>>>>>>> highest priority and close all of the not selected
>>>>>>>>>> components. This
>>>>>>>>>> function can be found at the path below in the branch:
>>>>>>>>>> opal/mca/base/mca_base_components_select.c
>>>>>>>>>>
>>>>>>>>>> To support this I had to formalize a query() function in the
>>>>>>>>>> mca_base_component_t of the form:
>>>>>>>>>> int mca_base_query_component_fn(mca_base_module_t **module,
>>>>>>>>>> int
>>>>>>>>>> *priority);
>>>>>>>>>>
>>>>>>>>>> This function is specified after the open and close component
>>>>>>>>>> functions in this structure as to allow compatibility with
>>>>>>>>>> frameworks
>>>>>>>>>> that do not use the base selection logic. Frameworks that do
>>>>>>>>>> *not*
>>>>>>>>>> use
>>>>>>>>>> this function are *not* effected by this commit. However,
>>>>>>>>>> every
>>>>>>>>>> component in the frameworks that use the mca_base_select
>>>>>>>>>> function
>>>>>>>>>> must
>>>>>>>>>> adjust their component query function to fit that specified
>>>>>>>>>> above.
>>>>>>>>>>
>>>>>>>>>> 18 frameworks in Open MPI have been changed. I have updated
>>>>>>>>>> all of
>>>>>>>>>> the
>>>>>>>>>> components in the 18 frameworks available in the trunk on my
>>>>>>>>>> branch.
>>>>>>>>>> The effected frameworks are:
>>>>>>>>>> - OPAL Carto
>>>>>>>>>> - OPAL crs
>>>>>>>>>> - OPAL maffinity
>>>>>>>>>> - OPAL memchecker
>>>>>>>>>> - OPAL paffinity
>>>>>>>>>> - ORTE errmgr
>>>>>>>>>> - ORTE ess
>>>>>>>>>> - ORTE Filem
>>>>>>>>>> - ORTE grpcomm
>>>>>>>>>> - ORTE odls
>>>>>>>>>> - ORTE pml
>>>>>>>>>> - ORTE ras
>>>>>>>>>> - ORTE rmaps
>>>>>>>>>> - ORTE routed
>>>>>>>>>> - ORTE snapc
>>>>>>>>>> - OMPI crcp
>>>>>>>>>> - OMPI dpm
>>>>>>>>>> - OMPI pubsub
>>>>>>>>>>
>>>>>>>>>> There was a question of the memory footprint change as a
>>>>>>>>>> result of
>>>>>>>>>> this commit. I used 'pmap' to determine process memory
>>>>>>>>>> footprint
>>>>>>>>>> of a
>>>>>>>>>> hello world MPI program. Static and Shared build numbers are
>>>>>>>>>> below
>>>>>>>>>> along with variations on launching locally and to a single
>>>>>>>>>> node
>>>>>>>>>> allocated by SLURM. All of this was on Indiana University's
>>>>>>>>>> Odin
>>>>>>>>>> machine. We compare against the trunk (r18276) representing
>>>>>>>>>> the last
>>>>>>>>>> SVN sync point of the branch.
>>>>>>>>>>
>>>>>>>>>> Process(shared)| Trunk | Branch | Diff (Improvement)
>>>>>>>>>> ---------------+----------+---------+-------
>>>>>>>>>> mpirun (orted) | 39976K | 36828K | 3148K
>>>>>>>>>> hello (0) | 229288K | 229268K | 20K
>>>>>>>>>> hello (1) | 229288K | 229268K | 20K
>>>>>>>>>> ---------------+----------+---------+-------
>>>>>>>>>> mpirun | 40032K | 37924K | 2108K
>>>>>>>>>> orted | 34720K | 34660K | 60K
>>>>>>>>>> hello (0) | 228404K | 228384K | 20K
>>>>>>>>>> hello (1) | 228404K | 228384K | 20K
>>>>>>>>>>
>>>>>>>>>> Process(static)| Trunk | Branch | Diff (Improvement)
>>>>>>>>>> ---------------+----------+---------+-------
>>>>>>>>>> mpirun (orted) | 21384K | 21372K | 12K
>>>>>>>>>> hello (0) | 194000K | 193980K | 20K
>>>>>>>>>> hello (1) | 194000K | 193980K | 20K
>>>>>>>>>> ---------------+----------+---------+-------
>>>>>>>>>> mpirun | 21384K | 21372K | 12K
>>>>>>>>>> orted | 21208K | 21196K | 12K
>>>>>>>>>> hello (0) | 193116K | 193096K | 20K
>>>>>>>>>> hello (1) | 193116K | 193096K | 20K
>>>>>>>>>>
>>>>>>>>>> As you can see there are some small memory footprint
>>>>>>>>>> improvements on
>>>>>>>>>> my branch that result from this work. The size of the Open
>>>>>>>>>> MPI
>>>>>>>>>> project
>>>>>>>>>> shrinks a bit as well. This commit cuts between 3,500 and
>>>>>>>>>> 2,000
>>>>>>>>>> lines
>>>>>>>>>> of code (depending on how you count) so about a ~1% code
>>>>>>>>>> shrink.
>>>>>>>>>>
>>>>>>>>>> The branch is stable in all of the testing I have done, but
>>>>>>>>>> there
>>>>>>>>>> are
>>>>>>>>>> some platforms on which I cannot test. So please give this
>>>>>>>>>> branch a
>>>>>>>>>> try and let me know if you find any problems.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Josh
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jeff Squyres
>>>>>>>>> Cisco Systems
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> - Pak Lui
>>>>>> pak.lui_at_[hidden]
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel