> I am pretty sure that LAM exploits the fact that the virtual processors
> are all
> sharing the same memory, so communication is via memory and/or the PCI bus
> of the system, while my OPENMPI configuration doesn't exploit this. Is this
> a reasonable diagnosis of the dramatic difference in performance? More
It would be more likely that OpenMPI is using shared memory and polling
on it whereas LAM is using sockets, or at least blocking on something.
Polling is a bad thing when oversubscribing processor. When you block on
a socket (or any OS handle), the process immediately yield the CPU and
is removed from the scheduler. When you poll waiting for a send or
receive to complete, you are burning cycles on the CPU and the scheduler
will wait for the next quantum of time before running another process.
So, if you send a message between 2 processes sharing the same
processor, the latency will be in the order of half of the scheduler
quantum (10ms on Linux) if they are both polling. Things are much faster
when processes are polling on different CPUs (1-2 us) but the blocking
socket overhead (~20us) is way better than the quantum of time when you
don't have several processors.
> importantly, how to I reconfigure OPENMPI to match the LAM performance.
Try disabling the shared memory device in OpenMPI. Unfortunately, I have
no clue how to do it.