On 10/9/07, Jeff Squyres <firstname.lastname@example.org> wrote:
One obvious solution would be to mpirun your controller tasks and, as
you mentioned, use MPI to communicate between them. Then you can use
MPI_COMM_SPAWN to launch the actual MPI job that you want to monitor.
Well. Yes, it's certainly could be done, but would not work in my scenario. As I said before,
I use dynamic
instrumentation API (DynInst API) to control and instrument MPI tasks.
DynInst is sort of a debugger, it uses ptrace() on Linux to control processes. So I need to use dyninst API
to create a controlled process (and not fork() it or MPI_Spawn () it),or eventually I could fork it, and later
attach (with DynInst) to a running process in order to to control it. In the latter case however, I would loose control
over the first several seconds of execution.
However, this will only more-or-less work. OMPI currently polls
aggressively to make message passing progress, so if you end up over-
subscribing nodes (because you filled up the cores on one node with
all the target MPI processes but also have 1 or more controller
processes running on the same node), they'll thrash each other and
you'll get -- at best -- unreliable/unrepeatable performance fraught
with lots of race conditions.
This actually is a less serious issue than it seems. The daemon itself is a very lightweight process. After executing the startup code (binary parsing, process creation and instrumentation) it lets the MPI process go without any additional overhead and than it sits waiting on certain events, so normally the intrusion is less than 2%. The overhead of instrumentation inserted into MPI task is controlled with a threshold and if placed reasonably stays low (egg. not in a tight loop that executes lots of times, but on entry/exit of let's say MPI_xxx comm calls).
Another issue is that OMPI's MPI_COMM_SPAWN does not give good
options to allow specific process placement, so it might be a little
dicey to get processes to land exactly where you want them.
Not an option, as daemon and task must sit on the same host. The best scenario is dual-core host, one cpu per task and another per daemon.
Alternatively, you could simply locally fork()/exec() your target
process from the controller. But the MPI spec does state that the
use of fork() is undefined within an MPI process. Indeed, if you
are using a high-speed network such as InfiniBand or Myrinet, calling
fork() after you call MPI_INIT, Bad Things(tm) will happen (we can
explain more if you care). But if you're only using TCP, you should
More less this is what I was doing. Daemon is mpirun, but it does not call MPI_Init itself but DynInst-forks the mpi task that calls MPI_Init. I tested this on OpenMPI using TCP/IP and Infiniband and MPICH and LAMMPI (on TCP) and it worked.
Another option might be to mpirun your target MPI app, have it wait
in some kind of local barrier, and then mpirun your controllers on
the same machines. The controllers find/attach to your target
processes, release them from the local barrier, and then you're good
to go -- both your controllers and your target app are fully up and
running under MPI. You'll still have the spinning/performance issue,
though -- so you won't want to oversubscribe nodes.
Absolutely, this would be attach scenario for the daemons and they could use MPI. Nice idea.
Unfortunately it would make the tool usage more complicated and their would be no control on what happens during first several seconds.
Does this help?
On Oct 1, 2007, at 10:49 PM, Oleg Morajko wrote:
> In the context of my PhD research, I have been developing a run-
> time performance analyzer for MPI-based applications.
> My tool provides a controller process for each MPI task. In
> particular, when a MPI job is started, a special wrapper script is
> generated that first starts my controller processes and next each
> controller spawns an actual MPI task (that performs MPI_Init etc.).
> I use dynamic instrumentation API (DynInst API) to control and
> instrument MPI tasks.
> The point is I need to intercommunicate my controller processes, in
> particular I need a point-to-point communication between arbitrary
> pair of controllers. So it seems reasonable to take advantage of
> MPI itself and use it for communication. However I am not sure what
> would be the impact of calling MPI_Init and communicating from
> controller processes taking into account both controllers and
> actual MPI processes where started with the same mpirun
> invocation. Actually I would need to assure that controllers have a
> separate MPI execution enviroment while the application has another
> Any suggestions how to achive that? Obviously another option is to
> use sockets to intercommunicate controllers, but having MPI this
> seems to be overkill.
> Thank you in advance for your help.
> PhD student, Universitat Autonoma de Barcelona, Spain
> users mailing list
users mailing list