Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI and DAPL 2.0.34 are incompatible?
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2011-12-06 15:29:07


Good morning,

> We've never recommended the use of dapl on Linux.
> I think it might have worked at one time, but I don't think anyone bothered to maintain it.
>
> On Linux, you should probably use native verbs support, instead.

Well, we use 'Open MPI + openib' since some years now (started with
Sun's ClusterTools and Open MPI 1.2.x, now we have self-build 1.4.x and
1.5.x Open MPI).

The problem is, that on our new, big, sexy cluster (some 1700 nodes
connected to common QDR InfiniBand fabric), running MPI over DAPL seem
to be quite faster than running over native IB. Yes, it is puzzling.

But reproducible:
Intel MPI (over DAPL) => 100%
OpenMPI (over openib) => 90% on some 4/5 machines (Westmere dual-Socket)
OpenMPI (over openib) => 45% on some 1/5 machines (Nehalem quad-Socket)
Intel MPI (over ofa) ==> the same values than OpenMPI!

(Bandwidth in a PingPong test, e.g. Intel MPI benchmark, and two other
PingPongs)

The question about WHY native IB is slower than DAPL is a very good one
(did you have any ideas?). As said it is reproducible: switching from
dapl to ofa in Intel MPI also switches the performance of PingPong.

(You may say "your test is wrong" but we tried out three different
PingPong tests, producing very similar values).

The second question is How to Learn OpenMPI to Use DAPL.

Meanwhile, I compiled lotz of versions (1.4.3, 1.4.4, 1.5.3, 1.5.4)
using at least two DAPL versions and option --with-udapl. The versions
are build well, but always on start, the initialisation of DAPL fails
(message see below) and the communication runs as usual over openib.

Also the error message says "may be an invalid Registry in the dat.conf
file", this seem to be very unlikely: with the same dat.conf the Intel
MPI can use DAPL. (and yes, OpenMPI really use the same dat.conf than
Intel MPI, set over DAT_OVERRIDE - checked and double-checked).

--------------------------------------------------------------------------
WARNING: Failed to open "ofa-v2-mlx4_0-1u"
[DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------

Because of the anticipated performance gain we would be very keen on
using DAPL with Open MPI. Does somebody have any idea what could be
wrong and what to check?

> On Dec 2, 2011, at 1:21 PM, Paul Kapinos wrote:
>
>> Dear Open MPI developer,
>>
>> OFED 1.5.4 will contain DAPL 2.0.34.
>>
>> I tried to compile the newest release of Open MPI (1.5.4) with this DAPL release and I was not successful.
>>
>> Configuring with --with-udapl=/path/to/2.0.34/dapl
>> got the error "/path/to/2.0.34/dapl/include/dat/udat.h not found"
>> Looking into include dir: there is no 'dat' subdir but 'dat2'.
>>
>> Just for fun I also tried to move 'dat2' to 'dat' back (dirty hack I know :-) - the configure stage was then successful but the compilation failed. The header seem to be really changed, not just moved.
>>
>> The question: are the Open MPI developer aware of this changes, and when a version of Open MPI will be available with support for DAPL 2.0.34?
>>
>> (Background: we have some trouble with Intel MPI and current DAPL which we do not have with DAPL 2.0.34, so our dream is to update as soon as possible)
>>
>> Best wishes and an nice weekend,
>>
>> Paul
>>
>>
>>
>>
>>
>>
>> http://www.openfabrics.org/downloads/OFED/release_notes/OFED_1.5.4_release_notes

-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915