Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Troy Telford (ttelford_at_[hidden])
Date: 2005-11-04 18:45:59


(Using svn 'trunk' revision 7927 of OpenMPI):

I've found an interesting issue with OpenMPI and the mvapi btl mca: Most
of the benchmarks I've tried (HPL, HPCC, Presta, IMB), do not seem to run
properly when the number of processes is sufficiently large (the barrier
seems to be at 65 processes in any case; more than 65 and things seem to
get stuck):

IMB: Wedges itself before finishing its first test (PingPong, 0 bytes, 2
processes). Even when the number of processes is small enough to run, it
may not finish (error message in attatchment).
HPCC: Wedges itself after starting the PTRANS section of the benchmark
(but before obtaining any results).
HPL: Behaves similarly to IMB and HPCC; doesn't even finish the smallest
of problem sizes.

Presta: the 'com' test almost completes; it only fails when matching rank
id pairs (and only then with number of processes greater than 65)
                the 'allred' test behaves like IMB, HPCC, and HPL
                the 'laten' test partially works (misbehavior is similar to 'com')
                the 'globalop' test was a dog on 4 nodes (some odd 360 times slower on
mvapi than on mx); it'll take a while to verify whether it tickles the
65-process issue or not.

Note: The cluster I am testing it on is of Dual-Opteron nodes; for
purposes of comparision, I modified the machines file to start one process
per node (total of 50 nodes). This ran with no complications. So the
problem seems to be related to the process count, and not the node count.

Note part zwei: the config.log is for a slightly newer version of openMPI
(7998; the difference to the trunk is about 4-5 files; none of them having
to do with mvapi. I really need to start reaping the config.log before
blasting it into oblivion.)

Unfortunately, I don't have enough myrinet hardware to test any more than
4 nodes with GM or MX; sorry.