Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Multirail + Open MPI 1.6.1 = very big latency for the first communication
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2012-11-01 06:35:37


IIRC, the first 16 or so messages over the openib btl uses the send/recv
API as opposed to rdma which is significantly faster. I am not sure as
to how 1.5.3 and multi-rail affects this but the preconnected I believe
short circuits when one cuts over to use rdma for eager messages.

--td

On 10/31/2012 3:36 PM, Paul Kapinos wrote:
> Hello all,
>
> Open MPI is clever and use by default multiple IB adapters, if available.
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-port-wireup
>
> Open MPI is lazy and establish connections only iff needed.
>
> Both is good.
>
> We have kinda special nodes: up to 16 sockets, 128 cores, 4 boards, 4
> IB cards. Multirail works!
>
> The crucial thing is, that starting with v1.6.1 the latency of the
> very first PingPong sample between two nodes take really a lot of time
> - some 100x - 200x of usual latency. You cannot see this using usual
> latency benchmark(*) because they tend to omit the first samples as
> "warmup phase", but we use a kinda self-written parallel test which
> clearly show this (and let me to muse some days).
> If Miltirail is forbidden (-mca btl_openib_max_btls 1), or if v.1.5.3
> used, or if the MPI processes are preconnected
> (http://www.open-mpi.org/faq/?category=running#mpi-preconnect) there
> is no such huge latency outliers for the first sample.
>
> Well, we know about the warm-up and lazy connections.
>
> But 200x ?!
>
> Any comments about that is OK so?
>
> Best,
>
> Paul Kapinos
>
> (*) E.g. HPCC explicitely say in
> http://icl.cs.utk.edu/hpcc/faq/index.html#132
> > Additional startup latencies are masked out by starting the
> measurement after
> > one non-measured ping-pong.
>
> P.S. Sorry for cross-posting to both Users and Developers, but my last
> questions to Users have no reply until yet, so trying to broadcast...
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel