Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Warner Yuen (wyuen_at_[hidden])
Date: 2007-03-29 14:08:37


George,

Thanks for the tips. It looks like using "-bynode" as opposed to "-
byslot" is the best way to distribute processes when running Amber9's
Sander module. I confirmed that with MPICH-MX as well. I didn't
realize that these settings were available. This really helps because
I was getting bummed that I would just have to keep various hostfiles
around some with slots=XX and some with nothing but the hostname.

Just an FYI on the timings:

-bynode:
real 0m35.035s

-byslot:
real 0m44.856s

Warner Yuen
Scientific Computing Consultant

On Mar 29, 2007, at 9:00 AM, users-request_at_[hidden] wrote:

> Message: 1
> Date: Wed, 28 Mar 2007 12:19:15 -0400
> From: George Bosilca <bosilca_at_[hidden]>
> Subject: Re: [OMPI users] Odd behavior with slots=4
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <2A58CF38-0FC4-4289-85E1-315376540F63_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
> There are multiple answers possible here. One is related to the over-
> subscription of your cluster, but I expect that there are at least 4
> cores per node if you want to use the slots=4 option. The real
> question is what is the communication pattern in this benchmark ? and
> how this match the distribution of the processes you use ?
>
> As a matter of fact, if when you have XX processes per node, and all
> of them will try to send a message to a remote process (here remote
> means on another node), then they will have to share the physical
> Myrinet link, which of course will lead to lower global performances
> when XX increase (from 1, to 2 and then 4). And this is true without
> regard on how you use the MX driver (via the Open MPI MTL or BTL).
>
> Open MPI provide 2 options to allow you to distribute the processes
> based on different criteria. Try to use -bynode and -byslot to see if
> this affect the overall performances.
>
> Thanks,
> george.
>
> On Mar 28, 2007, at 9:56 AM, Warner Yuen wrote:
>
>> Curious performance when using OpenMPI 1.2 to run Amber 9 on my
>> Xserve Xeon 5100 cluster. Each cluster node is a dual socket, dual-
>> core system. The cluster is also running with Myrinet 2000 with MX.
>> I'm just running some tests with one of Amber's benchmarks.
>>
>> It seems that my hostfiles effect the performance of the
>> application. I tried variations of the hostfile to see what would
>> happen. I did a straight mpirun with no mca options set using:
>> "mpirun -np 32"
>>
>> variation 1: hostname
>> real 0m35.391s
>>
>> variation 2: hostname slots=4
>> real 0m45.698s
>>
>> variation 3: hostname slots=2
>> real 0m38.761s
>>
>>
>> It seems that the best performance I achieve is when I use
>> variation 1 with only the hostname and execute the command:
>> "mpirun --hostfile hostfile -np 32 <my_application>" . Its
>> shockingly about 13% better performance than if I use the hostfile
>> with a syntax of "hostname slots=4".
>>
>> I also tried variations of in my mpirun command, here are the times:
>>
>> straight mpirun with not mca options
>> real 0m45.698s
>>
>> and....
>>
>> "-mca mpi_yield_when_idle 0"
>> real 0m44.912s
>>
>> and....
>>
>> "-mca mtl mx -mca pml cm"
>> real 0m45.002s