Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB
From: Craig Tierney (craig.tierney_at_[hidden])
Date: 2009-08-07 10:01:09


Terry Dontje wrote:
> Craig,
>
> Did your affinity script bind the processes per socket or linearly to
> cores. If the former you'll want to look at using rankfiles and place
> the ranks based on sockets. TWe have found this especially useful if
> you are not running fully subscribed on your machines.
>
> Also, if you think the main issue is collectives performance you may
> want to try using the hierarchical and SM collectives. However, be
> forewarned we are right now trying to pound out some errors with these
> modules. To enable them you add the following parameters "--mca
> coll_hierarch_priority 100 --mca coll_sm_priority 100". We would be
> very interested in any results you get (failures, improvements,
> non-improvements).
>

Adding these two options causes the code to segfault at startup.

Craig

> thanks,
>
> --td
>
>> Message: 4
>> Date: Thu, 06 Aug 2009 17:03:08 -0600
>> From: Craig Tierney <Craig.Tierney_at_[hidden]>
>> Subject: Re: [OMPI users] Performance question about OpenMPI and
>> MVAPICH2 on IB
>> To: Open MPI Users <users_at_[hidden]>
>> Message-ID: <4A7B612C.8070501_at_[hidden]>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> A followup....
>>
>> Part of problem was affinity. I had written a script to do processor
>> and memory affinity (which works fine with MVAPICH2). It is an
>> idea that I got from TACC. However, the script didn't seem to
>> work correctly with OpenMPI (or I still have bugs).
>>
>> Setting --mca mpi_paffinity_alone 1 made things better. However,
>> the performance is still not as good:
>>
>> Cores Mvapich2 Openmpi
>> ---------------------------
>> 8 17.3 17.3
>> 16 31.7 31.5
>> 32 62.9 62.8
>> 64 110.8 108.0
>> 128 219.2 201.4
>> 256 384.5 342.7
>> 512 687.2 537.6
>>
>> The performance number is GFlops (so larger is better).
>>
>> The first few numbers show that the executable is the right
>> speed. I verified that IB is being used by using OMB and
>> checking latency and bandwidth. Those numbers are what I
>> expect (3GB/s, 1.5mu/s for QDR).
>>
>> However, the Openmpi version is not scaling as well. Any
>> ideas on why that might be the case?
>>
>> Thanks,
>> Craig
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users