Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-02-01 16:17:22

On Jan 11, 2007, at 6:59 AM, Wolfgang Wieser wrote:

> I'm just in progress of selecting an MPI implementation to be
> used on a compute server cluster at the University of Munich.
> Since MPI_THREAD_MULTIPLE is a requirement, I went for OpenMPI.

Sorry for the delay in replying here -- all the OMPI developers are
crunching to meet our internal deadlines for the upcoming OMPI v1.2

Note that our MPI_THREAD_MULTIPLE support is hapazard at best. :-\
Multi-threaded support has been designed in from the very beginning,
but it has not risen high enough in priority yet to fully test and
debug MPI_THREAD_MULTIPLE support.

> The setup is a rack of boxes running Linux and connected with
> gigabit ethernet.
> However, there is a severe problem:
> Blocking functions like MPI_Probe() suck all CPU power.
> But as everybody knows, select(2), poll(2) and recently also
> epoll(2) were invented to give implementes a possibility to write
> programs with multiple IO channels without the need for busy waiting.
> So, I wonder if there is a way to have OpenMPI not make use of busy
> waiting but rather apply some kernel-level event selection function
> like the ones mentioned above.

The problem is that OMPI may have to poll several different types of
networks, to include shared memory. So we revert to a polling
approach, which sucks up lots of CPU. We pretty much assume that the
MPI process has full reign of the processor. For multi-threaded
scenarios, blocking progress threads are the plan, but as I mentioned
above, these are *very* loosely tested. I would not consider them

What you can do, however, is tell OMPI to poll in a less aggressive
mode -- meaning that we effectively call sched_yield() in every
iteration. You can do this by setting the "mpi_yield_when_idle" MCA
parameter to 1. For example:

   shell$ mpirun --mca mpi_yield_when_idle 1 -np 4 a.out

Additionally, there is ongoing discussion occurring right now between
OMPI developers to allow blocking when there is only TCP being used
(e.g., you disable shared memory at run time). It's unclear yet
whether this will be included in v1.2, but if it does, it will be
effective when you disable shared memory. For example:

   shell$ mpirun --mca btl ^sm -np 4 a.out

See the FAQ for more information about how to set MCA parameters, etc.

Jeff Squyres
Server Virtualization Business Unit
Cisco Systems