Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Multirail + Open MPI 1.6.1 = very big latency for the first communication
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2012-11-01 06:35:37


IIRC, the first 16 or so messages over the openib btl uses the send/recv
API as opposed to rdma which is significantly faster. I am not sure as
to how 1.5.3 and multi-rail affects this but the preconnected I believe
short circuits when one cuts over to use rdma for eager messages.

--td

On 10/31/2012 3:36 PM, Paul Kapinos wrote:
> Hello all,
>
> Open MPI is clever and use by default multiple IB adapters, if available.
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-port-wireup
>
> Open MPI is lazy and establish connections only iff needed.
>
> Both is good.
>
> We have kinda special nodes: up to 16 sockets, 128 cores, 4 boards, 4
> IB cards. Multirail works!
>
> The crucial thing is, that starting with v1.6.1 the latency of the
> very first PingPong sample between two nodes take really a lot of time
> - some 100x - 200x of usual latency. You cannot see this using usual
> latency benchmark(*) because they tend to omit the first samples as
> "warmup phase", but we use a kinda self-written parallel test which
> clearly show this (and let me to muse some days).
> If Miltirail is forbidden (-mca btl_openib_max_btls 1), or if v.1.5.3
> used, or if the MPI processes are preconnected
> (http://www.open-mpi.org/faq/?category=running#mpi-preconnect) there
> is no such huge latency outliers for the first sample.
>
> Well, we know about the warm-up and lazy connections.
>
> But 200x ?!
>
> Any comments about that is OK so?
>
> Best,
>
> Paul Kapinos
>
> (*) E.g. HPCC explicitely say in
> http://icl.cs.utk.edu/hpcc/faq/index.html#132
> > Additional startup latencies are masked out by starting the
> measurement after
> > one non-measured ping-pong.
>
> P.S. Sorry for cross-posting to both Users and Developers, but my last
> questions to Users have no reply until yet, so trying to broadcast...
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel