Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Multirail + Open MPI 1.6.1 = very big latency for the first communication
From: George Bosilca (bosilca_at_[hidden])
Date: 2012-11-01 00:35:29


It will depend on the protocol used by the OpenIB BTL to wire up the peers (OOB, UDCM, RDMACM). In the worst case (OOB), the connection process will be done using TCP. We are looking at a handshake (over TCP 40 ms latency for a one-way message is standard, the handshake will take at least 80ms). Moreover, we only check the status of the sockets once in a while (to avoid impacting the performance), so this should be added to the handshake as well. Plus the time to setup the local queues (which should be significantly smaller than all the others). The connection time is going up pretty quickly !

  george.

On Oct 31, 2012, at 15:36 , Paul Kapinos <kapinos_at_[hidden]> wrote:

> Hello all,
>
> Open MPI is clever and use by default multiple IB adapters, if available.
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-port-wireup
>
> Open MPI is lazy and establish connections only iff needed.
>
> Both is good.
>
> We have kinda special nodes: up to 16 sockets, 128 cores, 4 boards, 4 IB cards. Multirail works!
>
> The crucial thing is, that starting with v1.6.1 the latency of the very first PingPong sample between two nodes take really a lot of time - some 100x - 200x of usual latency. You cannot see this using usual latency benchmark(*) because they tend to omit the first samples as "warmup phase", but we use a kinda self-written parallel test which clearly show this (and let me to muse some days).
> If Miltirail is forbidden (-mca btl_openib_max_btls 1), or if v.1.5.3 used, or if the MPI processes are preconnected (http://www.open-mpi.org/faq/?category=running#mpi-preconnect) there is no such huge latency outliers for the first sample.
>
> Well, we know about the warm-up and lazy connections.
>
> But 200x ?!
>
> Any comments about that is OK so?
>
> Best,
>
> Paul Kapinos
>
> (*) E.g. HPCC explicitely say in http://icl.cs.utk.edu/hpcc/faq/index.html#132
> > Additional startup latencies are masked out by starting the measurement after
> > one non-measured ping-pong.
>
> P.S. Sorry for cross-posting to both Users and Developers, but my last questions to Users have no reply until yet, so trying to broadcast...
>
>
> --
> Dipl.-Inform. Paul Kapinos - High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23, D 52074 Aachen (Germany)
> Tel: +49 241/80-24915
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel