> Date: Fri, 07 Aug 2009 07:12:45 -0600
> From: Craig Tierney <craig.tierney_at_[hidden]>
> Subject: Re: [OMPI users] Performance question about OpenMPI and
> MVAPICH2 on IB
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <4A7C284D.3040603_at_[hidden]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Terry Dontje wrote:
>> > Craig,
>> > Did your affinity script bind the processes per socket or linearly to
>> > cores. If the former you'll want to look at using rankfiles and place
>> > the ranks based on sockets. TWe have found this especially useful if
>> > you are not running fully subscribed on your machines.
> The script binds them to sockets and also binds memory per node.
> It is smart enough that if the machine_file does not use all
> the cores (because the user reordered them) then the script will
> lay out the tasks evenly between the two sockets.
Ok, so you'll probably want to look at using rankfile (described in the
mpirun manpage) because mpi_paffinity_alone just does a linear binding
(rank 0 to cpu0, rank 1 to cpu 1...).
>> > Also, if you think the main issue is collectives performance you may
>> > want to try using the hierarchical and SM collectives. However, be
>> > forewarned we are right now trying to pound out some errors with these
>> > modules. To enable them you add the following parameters "--mca
>> > coll_hierarch_priority 100 --mca coll_sm_priority 100". We would be
>> > very interested in any results you get (failures, improvements,
>> > non-improvements).
> I don't know what it is slow. OpenMPI is so flexible in how the
> stack can be tuned. But I also have 100s of users runing dozens
> of major codes, and what I need is a set of options that 'just work'
> in most cases.
> I will try the above options and get back to you.