Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] mca_base_select()
From: Pak Lui (Pak.Lui_at_[hidden])
Date: 2008-05-08 21:04:35


Thanks very much Josh! Will try it out soon.

Josh Hursey wrote:
> Sorry about that. I didn't test that type of option. It should be
> working in r18418. Let me know if you see any more issues.
>
> -- Josh
>
> On May 8, 2008, at 6:04 PM, Pak Lui wrote:
>
>> I think I have a problem but I am not sure. I used to be able to use the
>> circumflex (^) to switch between the gridengine launcher and the ssh
>> launchers by doing something like this, e.g. -mca plm ^gridengine, to
>> exclude some of the components plm (and also in ras). It doesn't seem
>> like the 'negate' is in mca_base_component anymore. I guess I just have
>> to spell out which component I want explicitly...
>>
>> Josh Hursey wrote:
>>> This has been committed in r18381
>>>
>>> Please let me know if you have any problems with this commit.
>>>
>>> Cheers,
>>> Josh
>>>
>>> On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
>>>
>>>> Awesome.
>>>>
>>>> The branch is updated to the latest trunk head. I encourage folks to
>>>> check out this repository and make sure that it builds on their
>>>> system. A normal build of the branch should be enough to find out if
>>>> there are any cut-n-paste problems (though I tried to be careful,
>>>> mistakes do happen).
>>>>
>>>> I haven't heard any problems so this is looking like it will come in
>>>> tomorrow after the teleconf. I'll ask again there to see if there are
>>>> any voices of concern.
>>>>
>>>> Cheers,
>>>> Josh
>>>>
>>>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
>>>>
>>>>> This all sounds good to me!
>>>>>
>>>>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
>>>>>
>>>>>> What: Add mca_base_select() and adjust frameworks & components to
>>>>>> use
>>>>>> it.
>>>>>> Why: Consolidation of code for general goodness.
>>>>>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play
>>>>>> When: Code ready now. Documentation ready soon.
>>>>>> Timeout: May 6, 2008 (After teleconf) [1 week]
>>>>>>
>>>>>> Discussion:
>>>>>> -----------
>>>>>> For a number of years a few developers have been talking about
>>>>>> creating a MCA base component selection function. For various
>>>>>> reasons
>>>>>> this was never implemented. Recently I decided to give it a try.
>>>>>>
>>>>>> A base select function will allow Open MPI to provide completely
>>>>>> consistent selection behavior for many of its frameworks (18 of 31
>>>>>> to
>>>>>> be exact at the moment). The primary goal of this work is to
>>>>>> improving
>>>>>> code maintainability through code reuse. Other benefits also result
>>>>>> such as a slightly smaller memory footprint.
>>>>>>
>>>>>> The mca_base_select() function represented the most commonly used
>>>>>> logic for component selection: Select the one component with the
>>>>>> highest priority and close all of the not selected components. This
>>>>>> function can be found at the path below in the branch:
>>>>>> opal/mca/base/mca_base_components_select.c
>>>>>>
>>>>>> To support this I had to formalize a query() function in the
>>>>>> mca_base_component_t of the form:
>>>>>> int mca_base_query_component_fn(mca_base_module_t **module, int
>>>>>> *priority);
>>>>>>
>>>>>> This function is specified after the open and close component
>>>>>> functions in this structure as to allow compatibility with
>>>>>> frameworks
>>>>>> that do not use the base selection logic. Frameworks that do *not*
>>>>>> use
>>>>>> this function are *not* effected by this commit. However, every
>>>>>> component in the frameworks that use the mca_base_select function
>>>>>> must
>>>>>> adjust their component query function to fit that specified above.
>>>>>>
>>>>>> 18 frameworks in Open MPI have been changed. I have updated all of
>>>>>> the
>>>>>> components in the 18 frameworks available in the trunk on my branch.
>>>>>> The effected frameworks are:
>>>>>> - OPAL Carto
>>>>>> - OPAL crs
>>>>>> - OPAL maffinity
>>>>>> - OPAL memchecker
>>>>>> - OPAL paffinity
>>>>>> - ORTE errmgr
>>>>>> - ORTE ess
>>>>>> - ORTE Filem
>>>>>> - ORTE grpcomm
>>>>>> - ORTE odls
>>>>>> - ORTE pml
>>>>>> - ORTE ras
>>>>>> - ORTE rmaps
>>>>>> - ORTE routed
>>>>>> - ORTE snapc
>>>>>> - OMPI crcp
>>>>>> - OMPI dpm
>>>>>> - OMPI pubsub
>>>>>>
>>>>>> There was a question of the memory footprint change as a result of
>>>>>> this commit. I used 'pmap' to determine process memory footprint
>>>>>> of a
>>>>>> hello world MPI program. Static and Shared build numbers are below
>>>>>> along with variations on launching locally and to a single node
>>>>>> allocated by SLURM. All of this was on Indiana University's Odin
>>>>>> machine. We compare against the trunk (r18276) representing the last
>>>>>> SVN sync point of the branch.
>>>>>>
>>>>>> Process(shared)| Trunk | Branch | Diff (Improvement)
>>>>>> ---------------+----------+---------+-------
>>>>>> mpirun (orted) | 39976K | 36828K | 3148K
>>>>>> hello (0) | 229288K | 229268K | 20K
>>>>>> hello (1) | 229288K | 229268K | 20K
>>>>>> ---------------+----------+---------+-------
>>>>>> mpirun | 40032K | 37924K | 2108K
>>>>>> orted | 34720K | 34660K | 60K
>>>>>> hello (0) | 228404K | 228384K | 20K
>>>>>> hello (1) | 228404K | 228384K | 20K
>>>>>>
>>>>>> Process(static)| Trunk | Branch | Diff (Improvement)
>>>>>> ---------------+----------+---------+-------
>>>>>> mpirun (orted) | 21384K | 21372K | 12K
>>>>>> hello (0) | 194000K | 193980K | 20K
>>>>>> hello (1) | 194000K | 193980K | 20K
>>>>>> ---------------+----------+---------+-------
>>>>>> mpirun | 21384K | 21372K | 12K
>>>>>> orted | 21208K | 21196K | 12K
>>>>>> hello (0) | 193116K | 193096K | 20K
>>>>>> hello (1) | 193116K | 193096K | 20K
>>>>>>
>>>>>> As you can see there are some small memory footprint improvements on
>>>>>> my branch that result from this work. The size of the Open MPI
>>>>>> project
>>>>>> shrinks a bit as well. This commit cuts between 3,500 and 2,000
>>>>>> lines
>>>>>> of code (depending on how you count) so about a ~1% code shrink.
>>>>>>
>>>>>> The branch is stable in all of the testing I have done, but there
>>>>>> are
>>>>>> some platforms on which I cannot test. So please give this branch a
>>>>>> try and let me know if you find any problems.
>>>>>>
>>>>>> Cheers,
>>>>>> Josh
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> Cisco Systems
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>>
>> - Pak Lui
>> pak.lui_at_[hidden]
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
- Pak Lui
pak.lui_at_[hidden]