Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-01-04 16:34:57

On Jan 4, 2006, at 4:24 PM, Tom Rosmond wrote:

> I have been using LAM-MPI for many years on PC/Linux systems and
> have been quite pleased with its performance. However, at the
> urging of the
> LAM-MPI website, I have decided to switch to OPENMPI. For much of my
> preliminary testing I work on a single processor workstation (see
> the attached
> 'config.log' and ompi_info.log files for some of the specifics of
> my system). I
> frequently run with more than one virtual mpi processor (i.e.
> oversubscribe
> the real processor) to test my code. With LAM the runtime penalty
> for this
> is usually insignificant for 2-4 virtual processors, but with
> OPENMPI it has
> been prohibitive. Below is a matrix of runtimes for a simple MPI
> matrix
> transpose code using mpi_sendrecv( I tried other variations of
> blocking/
> non-blocking, synchronous/non-synchronous send/recv with similar
> results).
> message size= 262144 bytes
> 1 proc: .02575 secs .02513 secs
> 2 proc: .04603 secs 10.069 secs
> 4 proc: .04903 secs 35.422 secs
> I am pretty sure that LAM exploits the fact that the virtual
> processors are all
> sharing the same memory, so communication is via memory and/or the
> PCI bus
> of the system, while my OPENMPI configuration doesn't exploit
> this. Is this
> a reasonable diagnosis of the dramatic difference in performance?
> More
> importantly, how to I reconfigure OPENMPI to match the LAM
> performance.

Based on the output of ompi_info, you should be using shared memory
with Open MPI (as you are with LAM/MPI). What RPI are you using with
LAM/MPI (just so we have some idea what you are comparing to)? And
how are you running Open MPI (what command are you passing to mpirun,
and if you include a hostfile, what is in that host file)?

If you tell Open MPI via a hostfile that a machine has 2 cpus when it
only has 1 and try to run 2 processes on it, you will run into severe
performance issues. In that case, Open MPI will poll very quickly on
the CPUs, not giving up the CPU when there is nothing to do. If Open
MPI is told that there is only 1 cpu and you run 2 procs of the same
job on that node, then it will be much better about giving up the
CPU. That would be where I would start looking.

If you have some test code you could share, I'd love to see it - it
would help in duplicating your results and finding a solution...


   Brian Barrett
   Open MPI developer