Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-10-09 02:08:30

Interesting idea.

One obvious solution would be to mpirun your controller tasks and, as
you mentioned, use MPI to communicate between them. Then you can use
MPI_COMM_SPAWN to launch the actual MPI job that you want to monitor.

However, this will only more-or-less work. OMPI currently polls
aggressively to make message passing progress, so if you end up over-
subscribing nodes (because you filled up the cores on one node with
all the target MPI processes but also have 1 or more controller
processes running on the same node), they'll thrash each other and
you'll get -- at best -- unreliable/unrepeatable performance fraught
with lots of race conditions.

Another issue is that OMPI's MPI_COMM_SPAWN does not give good
options to allow specific process placement, so it might be a little
dicey to get processes to land exactly where you want them.

Alternatively, you could simply locally fork()/exec() your target
process from the controller. But the MPI spec does state that the
use of fork() is undefined within an MPI process. Indeed, if you
are using a high-speed network such as InfiniBand or Myrinet, calling
fork() after you call MPI_INIT, Bad Things(tm) will happen (we can
explain more if you care). But if you're only using TCP, you should
be fine.

Another option might be to mpirun your target MPI app, have it wait
in some kind of local barrier, and then mpirun your controllers on
the same machines. The controllers find/attach to your target
processes, release them from the local barrier, and then you're good
to go -- both your controllers and your target app are fully up and
running under MPI. You'll still have the spinning/performance issue,
though -- so you won't want to oversubscribe nodes.

Does this help?

On Oct 1, 2007, at 10:49 PM, Oleg Morajko wrote:

> Hello,
> In the context of my PhD research, I have been developing a run-
> time performance analyzer for MPI-based applications.
> My tool provides a controller process for each MPI task. In
> particular, when a MPI job is started, a special wrapper script is
> generated that first starts my controller processes and next each
> controller spawns an actual MPI task (that performs MPI_Init etc.).
> I use dynamic instrumentation API (DynInst API) to control and
> instrument MPI tasks.
> The point is I need to intercommunicate my controller processes, in
> particular I need a point-to-point communication between arbitrary
> pair of controllers. So it seems reasonable to take advantage of
> MPI itself and use it for communication. However I am not sure what
> would be the impact of calling MPI_Init and communicating from
> controller processes taking into account both controllers and
> actual MPI processes where started with the same mpirun
> invocation. Actually I would need to assure that controllers have a
> separate MPI execution enviroment while the application has another
> one.
> Any suggestions how to achive that? Obviously another option is to
> use sockets to intercommunicate controllers, but having MPI this
> seems to be overkill.
> Thank you in advance for your help.
> Regards,
> --Oleg
> PhD student, Universitat Autonoma de Barcelona, Spain
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
Cisco Systems