Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] mca_base_select()
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2008-05-11 13:30:11


Hi,
I tried r 18423 with rank_file component and got seqfault
( I increase priority of the component if rmaps_rank_file_path exist)

/home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun -np 4 -hostfile hostfile_ompi
-mca rmaps_rank_file_path rankfile -mca paffinity_base_verbose 5 ./mpi_p_SMD
-t bw -output 1 -order 1
[witch1:25456] mca:base:select: Querying component [linux]
[witch1:25456] mca:base:select: Query of component [linux] set priority to
10
[witch1:25456] mca:base:select: Selected component [linux]
[witch1:25456] *** Process received signal ***
[witch1:25456] Signal: Segmentation fault (11)
[witch1:25456] Signal code: Invalid permissions (2)
[witch1:25456] Failing at address: 0x2b2875530030
[witch1:25456] [ 0] /lib64/libpthread.so.0 [0x2b28759dfc10]
[witch1:25456] [ 1] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-pal.so.0
[0x2b28753e2bb6]
[witch1:25456] [ 2] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-pal.so.0
[0x2b28753e23b6]
[witch1:25456] [ 3] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-pal.so.0
[0x2b28753e22fd]
[witch1:25456] [ 4]
/home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-rte.so.0(orte_util_encode_pidmap+0x2f4)
[0x2b287527f412]
[witch1:25456] [ 5]
/home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x989)
[0x2b28752934f5]
[witch1:25456] [ 6]
/home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x1a3)
[0x2b287529e60b]
[witch1:25456] [ 7]
/home/USERS/lenny/OMPI_ORTE_SMD/lib/openmpi/mca_plm_rsh.so [0x2b287612f788]
[witch1:25456] [ 8] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun [0x4032bf]
[witch1:25456] [ 9] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun [0x402b53]
[witch1:25456] [10] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x2b2875b06154]
[witch1:25456] [11] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun [0x402aa9]
[witch1:25456] *** End of error message ***
Segmentation fault

On Tue, May 6, 2008 at 9:09 PM, Josh Hursey <jjhursey_at_[hidden]> wrote:

> This has been committed in r18381
>
> Please let me know if you have any problems with this commit.
>
> Cheers,
> Josh
>
> On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
>
> > Awesome.
> >
> > The branch is updated to the latest trunk head. I encourage folks to
> > check out this repository and make sure that it builds on their
> > system. A normal build of the branch should be enough to find out if
> > there are any cut-n-paste problems (though I tried to be careful,
> > mistakes do happen).
> >
> > I haven't heard any problems so this is looking like it will come in
> > tomorrow after the teleconf. I'll ask again there to see if there are
> > any voices of concern.
> >
> > Cheers,
> > Josh
> >
> > On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
> >
> >> This all sounds good to me!
> >>
> >> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
> >>
> >>> What: Add mca_base_select() and adjust frameworks & components to
> >>> use
> >>> it.
> >>> Why: Consolidation of code for general goodness.
> >>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play
> >>> When: Code ready now. Documentation ready soon.
> >>> Timeout: May 6, 2008 (After teleconf) [1 week]
> >>>
> >>> Discussion:
> >>> -----------
> >>> For a number of years a few developers have been talking about
> >>> creating a MCA base component selection function. For various
> >>> reasons
> >>> this was never implemented. Recently I decided to give it a try.
> >>>
> >>> A base select function will allow Open MPI to provide completely
> >>> consistent selection behavior for many of its frameworks (18 of 31
> >>> to
> >>> be exact at the moment). The primary goal of this work is to
> >>> improving
> >>> code maintainability through code reuse. Other benefits also result
> >>> such as a slightly smaller memory footprint.
> >>>
> >>> The mca_base_select() function represented the most commonly used
> >>> logic for component selection: Select the one component with the
> >>> highest priority and close all of the not selected components. This
> >>> function can be found at the path below in the branch:
> >>> opal/mca/base/mca_base_components_select.c
> >>>
> >>> To support this I had to formalize a query() function in the
> >>> mca_base_component_t of the form:
> >>> int mca_base_query_component_fn(mca_base_module_t **module, int
> >>> *priority);
> >>>
> >>> This function is specified after the open and close component
> >>> functions in this structure as to allow compatibility with
> >>> frameworks
> >>> that do not use the base selection logic. Frameworks that do *not*
> >>> use
> >>> this function are *not* effected by this commit. However, every
> >>> component in the frameworks that use the mca_base_select function
> >>> must
> >>> adjust their component query function to fit that specified above.
> >>>
> >>> 18 frameworks in Open MPI have been changed. I have updated all of
> >>> the
> >>> components in the 18 frameworks available in the trunk on my branch.
> >>> The effected frameworks are:
> >>> - OPAL Carto
> >>> - OPAL crs
> >>> - OPAL maffinity
> >>> - OPAL memchecker
> >>> - OPAL paffinity
> >>> - ORTE errmgr
> >>> - ORTE ess
> >>> - ORTE Filem
> >>> - ORTE grpcomm
> >>> - ORTE odls
> >>> - ORTE pml
> >>> - ORTE ras
> >>> - ORTE rmaps
> >>> - ORTE routed
> >>> - ORTE snapc
> >>> - OMPI crcp
> >>> - OMPI dpm
> >>> - OMPI pubsub
> >>>
> >>> There was a question of the memory footprint change as a result of
> >>> this commit. I used 'pmap' to determine process memory footprint
> >>> of a
> >>> hello world MPI program. Static and Shared build numbers are below
> >>> along with variations on launching locally and to a single node
> >>> allocated by SLURM. All of this was on Indiana University's Odin
> >>> machine. We compare against the trunk (r18276) representing the last
> >>> SVN sync point of the branch.
> >>>
> >>> Process(shared)| Trunk | Branch | Diff (Improvement)
> >>> ---------------+----------+---------+-------
> >>> mpirun (orted) | 39976K | 36828K | 3148K
> >>> hello (0) | 229288K | 229268K | 20K
> >>> hello (1) | 229288K | 229268K | 20K
> >>> ---------------+----------+---------+-------
> >>> mpirun | 40032K | 37924K | 2108K
> >>> orted | 34720K | 34660K | 60K
> >>> hello (0) | 228404K | 228384K | 20K
> >>> hello (1) | 228404K | 228384K | 20K
> >>>
> >>> Process(static)| Trunk | Branch | Diff (Improvement)
> >>> ---------------+----------+---------+-------
> >>> mpirun (orted) | 21384K | 21372K | 12K
> >>> hello (0) | 194000K | 193980K | 20K
> >>> hello (1) | 194000K | 193980K | 20K
> >>> ---------------+----------+---------+-------
> >>> mpirun | 21384K | 21372K | 12K
> >>> orted | 21208K | 21196K | 12K
> >>> hello (0) | 193116K | 193096K | 20K
> >>> hello (1) | 193116K | 193096K | 20K
> >>>
> >>> As you can see there are some small memory footprint improvements on
> >>> my branch that result from this work. The size of the Open MPI
> >>> project
> >>> shrinks a bit as well. This commit cuts between 3,500 and 2,000
> >>> lines
> >>> of code (depending on how you count) so about a ~1% code shrink.
> >>>
> >>> The branch is stable in all of the testing I have done, but there
> >>> are
> >>> some platforms on which I cannot test. So please give this branch a
> >>> try and let me know if you find any problems.
> >>>
> >>> Cheers,
> >>> Josh
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >>
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>