Thanks for the quick reply. I ran my tests with a hostfile with
I clearly misunderstood the role of the 'slots' parameter, because
when I removed it, OPENMPI slightly outperformed LAM, which I
assume it should. Thanks for the help.
Brian Barrett wrote:
>On Jan 4, 2006, at 4:24 PM, Tom Rosmond wrote:
>>I have been using LAM-MPI for many years on PC/Linux systems and
>>have been quite pleased with its performance. However, at the
>>urging of the
>>LAM-MPI website, I have decided to switch to OPENMPI. For much of my
>>preliminary testing I work on a single processor workstation (see
>>'config.log' and ompi_info.log files for some of the specifics of
>>my system). I
>>frequently run with more than one virtual mpi processor (i.e.
>>the real processor) to test my code. With LAM the runtime penalty
>>is usually insignificant for 2-4 virtual processors, but with
>>OPENMPI it has
>>been prohibitive. Below is a matrix of runtimes for a simple MPI
>>transpose code using mpi_sendrecv( I tried other variations of
>>non-blocking, synchronous/non-synchronous send/recv with similar
>> message size= 262144 bytes
>> LAM OPENMPI
>> 1 proc: .02575 secs .02513 secs
>> 2 proc: .04603 secs 10.069 secs
>> 4 proc: .04903 secs 35.422 secs
>>I am pretty sure that LAM exploits the fact that the virtual
>>processors are all
>>sharing the same memory, so communication is via memory and/or the
>>of the system, while my OPENMPI configuration doesn't exploit
>>this. Is this
>>a reasonable diagnosis of the dramatic difference in performance?
>>importantly, how to I reconfigure OPENMPI to match the LAM
>Based on the output of ompi_info, you should be using shared memory
>with Open MPI (as you are with LAM/MPI). What RPI are you using with
>LAM/MPI (just so we have some idea what you are comparing to)? And
>how are you running Open MPI (what command are you passing to mpirun,
>and if you include a hostfile, what is in that host file)?
>If you tell Open MPI via a hostfile that a machine has 2 cpus when it
>only has 1 and try to run 2 processes on it, you will run into severe
>performance issues. In that case, Open MPI will poll very quickly on
>the CPUs, not giving up the CPU when there is nothing to do. If Open
>MPI is told that there is only 1 cpu and you run 2 procs of the same
>job on that node, then it will be much better about giving up the
>CPU. That would be where I would start looking.
>If you have some test code you could share, I'd love to see it - it
>would help in duplicating your results and finding a solution...