Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] poor btl sm latency
From: Matthias Jurenz (matthias.jurenz_at_[hidden])
Date: 2012-03-05 03:52:39


Here the SM BTL parameters:

$ ompi_info --param btl sm
MCA btl: parameter "btl_base_verbose" (current value: <0>, data source:
default value) Verbosity level of the BTL framework
MCA btl: parameter "btl" (current value: <self,sm,openib>, data source: file
[/sw/atlas/libraries/openmpi/1.5.5rc3/x86_64/etc/openmpi-mca-params.conf])
Default selection set of components for the btl framework (<none> means use
all components that can be found)
MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source:
default value) Whether this component supports the knem Linux kernel module or
not
MCA btl: parameter "btl_sm_use_knem" (current value: <-1>, data source:
default value) Whether knem support is desired or not (negative = try to
enable knem support, but continue even if it is not available, 0 = do not
enable knem support, positive = try to enable knem support and fail if it is
not available)
MCA btl: parameter "btl_sm_knem_dma_min" (current value: <0>, data source:
default value) Minimum message size (in bytes) to use the knem DMA mode;
ignored if knem does not support DMA mode (0 = do not use the knem DMA mode)
MCA btl: parameter "btl_sm_knem_max_simultaneous" (current value: <0>, data
source: default value) Max number of simultaneous ongoing knem operations to
support (0 = do everything synchronously, which probably gives the best large
message latency; >0 means to do all operations asynchronously, which supports
better overlap for simultaneous large message sends)
MCA btl: parameter "btl_sm_free_list_num" (current value: <8>, data source:
default value)
MCA btl: parameter "btl_sm_free_list_max" (current value: <-1>, data source:
default value)
MCA btl: parameter "btl_sm_free_list_inc" (current value: <64>, data source:
default value)
MCA btl: parameter "btl_sm_max_procs" (current value: <-1>, data source:
default value)
MCA btl: parameter "btl_sm_mpool" (current value: <sm>, data source: default
value)
MCA btl: parameter "btl_sm_fifo_size" (current value: <4096>, data source:
default value)
MCA btl: parameter "btl_sm_num_fifos" (current value: <1>, data source: default
value)
MCA btl: parameter "btl_sm_fifo_lazy_free" (current value: <120>, data source:
default value)
MCA btl: parameter "btl_sm_sm_extra_procs" (current value: <0>, data source:
default value)
MCA btl: parameter "btl_sm_exclusivity" (current value: <65535>, data source:
default value) BTL exclusivity (must be >= 0)
MCA btl: parameter "btl_sm_flags" (current value: <5>, data source: default
value) BTL bit flags (general flags: SEND=1, PUT=2, GET=4, SEND_INPLACE=8,
RDMA_MATCHED=64, HETEROGENEOUS_RDMA=256; flags only used by the "dr" PML
(ignored by others): ACK=16, CHECKSUM=32, RDMA_COMPLETION=128; flags only used
by the "bfo" PML (ignored by others): FAILOVER_SUPPORT=512)
MCA btl: parameter "btl_sm_rndv_eager_limit" (current value: <4096>, data
source: default value) Size (in bytes) of "phase 1" fragment sent for all
large messages (must be >= 0 and <= eager_limit)
MCA btl: parameter "btl_sm_eager_limit" (current value: <4096>, data source:
default value) Maximum size (in bytes) of "short" messages (must be >= 1).
MCA btl: parameter "btl_sm_max_send_size" (current value: <32768>, data
source: default value) Maximum size (in bytes) of a single "phase 2" fragment
of a long message when using the pipeline protocol (must be >= 1)
MCA btl: parameter "btl_sm_bandwidth" (current value: <9000>, data source:
default value) Approximate maximum bandwidth of interconnect(0 = auto-detect
value at run-time [not supported in all BTL modules], >= 1 = bandwidth in
Mbps)
MCA btl: parameter "btl_sm_latency" (current value: <1>, data source: default
value) Approximate latency of interconnect (must be >= 0)
MCA btl: parameter "btl_sm_priority" (current value: <0>, data source: default
value)
MCA btl: parameter "btl_base_warn_component_unused" (current value: <1>, data
source: default value) This parameter is used to turn on warning messages when
certain NICs are not used

Matthias

On Friday 02 March 2012 16:23:32 George Bosilca wrote:
> Please do a "ompi_info --param btl sm" on your environment. The lazy_free
> direct the internals of the SM BTL not to release the memory fragments
> used to communicate until the lazy limit is reached. The default value was
> deemed as reasonable a while back when the number of default fragments was
> large. Lately there were some patches to reduce the memory footprint of
> the SM BTL and these might have lowered the available fragments to a limit
> where the default value for the lazy_free is now too large.
>
> george.
>
> On Mar 2, 2012, at 10:08 , Matthias Jurenz wrote:
> > In thanks to the OTPO tool, I figured out that setting the MCA parameter
> > btl_sm_fifo_lazy_free to 1 (default is 120) improves the latency
> > significantly: 0,88µs
> >
> > But somehow I get the feeling that this doesn't eliminate the actual
> > problem...
> >
> > Matthias
> >
> > On Friday 02 March 2012 15:37:03 Matthias Jurenz wrote:
> >> On Friday 02 March 2012 14:58:45 Jeffrey Squyres wrote:
> >>> Ok. Good that there's no oversubscription bug, at least. :-)
> >>>
> >>> Did you see my off-list mail to you yesterday about building with an
> >>> external copy of hwloc 1.4 to see if that helps?
> >>
> >> Yes, I did - I answered as well. Our mail server seems to be something
> >> busy today...
> >>
> >> Just for the record: Using hwloc-1.4 makes no difference.
> >>
> >> Matthias
> >>
> >>> On Mar 2, 2012, at 8:26 AM, Matthias Jurenz wrote:
> >>>> To exclude a possible bug within the LSF component, I rebuilt Open MPI
> >>>> without support for LSF (--without-lsf).
> >>>>
> >>>> -> It makes no difference - the latency is still bad: ~1.1us.
> >>>>
> >>>> Matthias
> >>>>
> >>>> On Friday 02 March 2012 13:50:13 Matthias Jurenz wrote:
> >>>>> SORRY, it was obviously a big mistake by me. :-(
> >>>>>
> >>>>> Open MPI 1.5.5 was built with LSF support, so when starting an LSF
> >>>>> job it's necessary to request at least the number of tasks/cores as
> >>>>> used for the subsequent mpirun command. That was not the case - I
> >>>>> forgot the bsub's '-n' option to specify the number of task, so only
> >>>>> *one* task/core was requested.
> >>>>>
> >>>>> Open MPI 1.4.5 was built *without* LSF support, so the supposed
> >>>>> misbehavior could not happen with it.
> >>>>>
> >>>>> In short, there is no bug in Open MPI 1.5.x regarding to the
> >>>>> detection of oversubscription. Sorry for any confusion!
> >>>>>
> >>>>> Matthias
> >>>>>
> >>>>> On Tuesday 28 February 2012 13:36:56 Matthias Jurenz wrote:
> >>>>>> When using Open MPI v1.4.5 I get ~1.1us. That's the same result as I
> >>>>>> get with Open MPI v1.5.x using mpi_yield_when_idle=0.
> >>>>>> So I think there is a bug in Open MPI (v1.5.4 and v1.5.5rc2)
> >>>>>> regarding to the automatic performance mode selection.
> >>>>>>
> >>>>>> When enabling the degraded performance mode for Open MPI 1.4.5
> >>>>>> (mpi_yield_when_idle=1) I get ~1.8us latencies.
> >>>>>>
> >>>>>> Matthias
> >>>>>>
> >>>>>> On Tuesday 28 February 2012 06:20:28 Christopher Samuel wrote:
> >>>>>>> On 13/02/12 22:11, Matthias Jurenz wrote:
> >>>>>>>> Do you have any idea? Please help!
> >>>>>>>
> >>>>>>> Do you see the same bad latency in the old branch (1.4.5) ?
> >>>>>>>
> >>>>>>> cheers,
> >>>>>>> Chris
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> devel mailing list
> >>>>>> devel_at_[hidden]
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>
> >>>>> _______________________________________________
> >>>>> devel mailing list
> >>>>> devel_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel