Thanks for the tips. It looks like using "-bynode" as opposed to "-
byslot" is the best way to distribute processes when running Amber9's
Sander module. I confirmed that with MPICH-MX as well. I didn't
realize that these settings were available. This really helps because
I was getting bummed that I would just have to keep various hostfiles
around some with slots=XX and some with nothing but the hostname.
Just an FYI on the timings:
Scientific Computing Consultant
On Mar 29, 2007, at 9:00 AM, users-request_at_[hidden] wrote:
> Message: 1
> Date: Wed, 28 Mar 2007 12:19:15 -0400
> From: George Bosilca <bosilca_at_[hidden]>
> Subject: Re: [OMPI users] Odd behavior with slots=4
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <2A58CF38-0FC4-4289-85E1-315376540F63_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
> There are multiple answers possible here. One is related to the over-
> subscription of your cluster, but I expect that there are at least 4
> cores per node if you want to use the slots=4 option. The real
> question is what is the communication pattern in this benchmark ? and
> how this match the distribution of the processes you use ?
> As a matter of fact, if when you have XX processes per node, and all
> of them will try to send a message to a remote process (here remote
> means on another node), then they will have to share the physical
> Myrinet link, which of course will lead to lower global performances
> when XX increase (from 1, to 2 and then 4). And this is true without
> regard on how you use the MX driver (via the Open MPI MTL or BTL).
> Open MPI provide 2 options to allow you to distribute the processes
> based on different criteria. Try to use -bynode and -byslot to see if
> this affect the overall performances.
> On Mar 28, 2007, at 9:56 AM, Warner Yuen wrote:
>> Curious performance when using OpenMPI 1.2 to run Amber 9 on my
>> Xserve Xeon 5100 cluster. Each cluster node is a dual socket, dual-
>> core system. The cluster is also running with Myrinet 2000 with MX.
>> I'm just running some tests with one of Amber's benchmarks.
>> It seems that my hostfiles effect the performance of the
>> application. I tried variations of the hostfile to see what would
>> happen. I did a straight mpirun with no mca options set using:
>> "mpirun -np 32"
>> variation 1: hostname
>> real 0m35.391s
>> variation 2: hostname slots=4
>> real 0m45.698s
>> variation 3: hostname slots=2
>> real 0m38.761s
>> It seems that the best performance I achieve is when I use
>> variation 1 with only the hostname and execute the command:
>> "mpirun --hostfile hostfile -np 32 <my_application>" . Its
>> shockingly about 13% better performance than if I use the hostfile
>> with a syntax of "hostname slots=4".
>> I also tried variations of in my mpirun command, here are the times:
>> straight mpirun with not mca options
>> real 0m45.698s
>> "-mca mpi_yield_when_idle 0"
>> real 0m44.912s
>> "-mca mtl mx -mca pml cm"
>> real 0m45.002s