Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Multirail + Open MPI 1.6.1 = very big latency for the first communication
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2012-10-31 15:36:59


Hello all,

Open MPI is clever and use by default multiple IB adapters, if available.
http://www.open-mpi.org/faq/?category=openfabrics#ofa-port-wireup

Open MPI is lazy and establish connections only iff needed.

Both is good.

We have kinda special nodes: up to 16 sockets, 128 cores, 4 boards, 4 IB cards.
Multirail works!

The crucial thing is, that starting with v1.6.1 the latency of the very first
PingPong sample between two nodes take really a lot of time - some 100x - 200x
of usual latency. You cannot see this using usual latency benchmark(*) because
they tend to omit the first samples as "warmup phase", but we use a kinda
self-written parallel test which clearly show this (and let me to muse some days).
If Miltirail is forbidden (-mca btl_openib_max_btls 1), or if v.1.5.3 used, or
if the MPI processes are preconnected
(http://www.open-mpi.org/faq/?category=running#mpi-preconnect) there is no such
huge latency outliers for the first sample.

Well, we know about the warm-up and lazy connections.

But 200x ?!

Any comments about that is OK so?

Best,

Paul Kapinos

(*) E.g. HPCC explicitely say in http://icl.cs.utk.edu/hpcc/faq/index.html#132
> Additional startup latencies are masked out by starting the measurement after
> one non-measured ping-pong.

P.S. Sorry for cross-posting to both Users and Developers, but my last questions
to Users have no reply until yet, so trying to broadcast...

-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915