Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-10-20 15:26:25


On Thu, 2005-10-20 at 22:03 +0400, Konstantin Karganov wrote:
> > However, we're quite open to other approaches. Because of the nature of
> > our integration with a variety of different run-time environments, our
> > startup is not a shell script -- mpirun ("orterun" is its real name;
> > "mpirun" is a sym link to orterun) is a compiled executable.
> Surely, I saw that mpirun is the orterun executable :)
> And this means that to add some features I need to rebuild it (and some
> run-time libs probably) each time.

Correct.

> > What are the requirements of your debugger? Do you attempt to launch
> > the MPI processes yourself, or do you attach to them after they are
> > launched (which is what TotalView does)?
> It is supposed to attach GDB to each process after it has launched, so the
> TotalView interface goes well, except that its details are hardcoded in
> the source of orte/tools/orterun (as you may guess I don't have the
> executable named "totalview", etc.).

You and Chris G. raise a good point -- another parallel debugger vendor
has contacted me about the same issue (their debugger does not have an
executable named "totalview"). In off-list iterations with him, we
decided on some kind of format like:

        mpirun [--debugger <name>] --debug ..

The intent here is to make the common case easy for the user, but also
allow flexibility in which back-end debugger is invoked.

First -- the common case:

        mpirun --debug -np 4 a.out

Will invoke whatever back-end debugger the user has with the proper argv
to get mpirun and "-np -4 a.out" passed back to it.

--debugger is a synonym for an MCA parameter, so it can be set in a
variety of ways (e.g., command line, environment variable, or in a
file). The string parameter for --debugger can specify multiple
different debuggers (and associated command lines -- with string
substitution -- to invoke those debuggers); OMPI's mpirun will search
for the first debugger that it can find in the current PATH and invoke
it. For example, we'll probably have a default value for --debugger
something like:

        "totalview mpirun -a @mpirun_args@ : fx2 mpirun -a @mpirun_args"
and assume that the user invoked
        mpirun --debug -np 4 a.out

This would tell OMPI's mpirun to first search for "totalview" in the
current $PATH. If it doesn't find it, then search for "fx2" in the
$PATH. If it is found, mpirun will exec "fx2 mpirun -a -np 4 a.out".

And, of course, anyone can override that default value (and we're open
to adding more -- TV and FX2 are the only ones that I'm aware of at the
moment).

Also, this only works well for cases where we want to exec a new
application to invoke the debugger. Specifically, using "--debug" to
start under TV and FX2 is simply syntactic sugar for invoking it
yourself, but we've found that users tend to like this.

This is the current plan (I haven't gotten around to implementing it yet
-- it's probably only 2-3 hours worth of work, but it hasn't been a high
priority yet). Comments?

> I'd like to know when and where do
> the functions from orterun/totalview.{h,c} get called, do I need to write
> my own file like this, etc. In other words, "the debugger adder reference
> manual" :)

Right now, there is no such manual -- we had only added the TV stuff
according to what TV (and FX2 and DDT) require. These functions are
always invoked inside mpirun -- one is just before we actually launch
the processes and the other is right after we have confirmation that
they're all blocking inside MPI_INIT waiting for the debugger to attach.

Read the TV specifications about how they attach -- if you have a
different scheme, let's talk... As you probably know, OMPI is
fundamentally based upon a component architecture. We could open this
up to making the parallel debugging stuff be a component, and, as such,
do something totally different for different debuggers.

> Currently I launch gdb's on remote processes via ssh (as MPICH does), but
> probably it will be better to use orte framework capabilities for this.
> Don't know yet how.

Gotcha; not a bad idea. Might fit nicely into having support for your
debugger be a component...?

When making a new kind of component for OMPI, we always ask ourselves:
what, abstractly, does this thing need to do? Assume that we already
have controls that tell the MPI processes that they're being debugged
(or not). If they are, they'll need to wait upon some kind of
notification from the debugger indicating that it has attached before
continuing (right now, this is at the very, very end of MPI_INIT; they
wait for the value of a variable to change). Additionally, the debugger
needs to be able to discover the nodename/PID's of the MPI processes of
interest.

For basic attaching purposes, I think that these are the main points.
Any other ideas?

> In general, are there an ompi/orte architecture description docs, other
> than short schemes in your publications? It's too general there and too
> detailed in sources and doxygen docs. Some intermediate "how all this
> works together" doc is needed to assemble the whole picture...
> For me, I do not understand it completely.

The Open Run-Time Environment (ORTE) layer in OMPI is responsible for
all this kind of stuff -- it's all the things that happen before
MPI_INIT is ever reached (hence, "orterun"). There's a fairly
complicated dance that occurs to spawn a "job" (a collection of
individual processes).

I think the two main things you want are:

1. the information about the MPI processes in the ORTE job of interest
(are you interested in handling MPI-2 dynamic situations?). Right now,
this is only available in the totalview.c code in orterun (per the TV
specs). But as I mentioned, we could do something else.

2. how to launch your debugger agents out alongside the MPI processes of
interest. Since we have little/no documentation about the internals at
this point, I'm admittedly waving my hands here, but essentially you'll
call orte_rmgr.spawn(), very similar to the invocation in orterun.c.
75% of orterun.c is setting up the arguments to spawn() (not because the
arguments are complicated, but rather because we allow quite complex
command line argument forms to orterun); the remaining 25% is waiting
for the various notifications of completion from ORTE that the job is
dead. We might need a little extra logic here to ensure that your job
is literally launched alongside the processes of interest, but this is
certainly do-able.

> > Open MPI uses orterun as its launcher, not the first MPI process.
> > Hence, it is the one that TotalView gets it information from (in that
> > sense, it's similar to the MPICH model -- there is one coordinator; it's
> > just that it's orterun, not the first MPI process). Once orterun
> > receives notification that all the MPI processes have started, it gives
> > the nodename/PID information of each process to TotalView who then
> > launches its own debugger processes on those nodes and attaches to the
> > processes.
> Hm.. with MPICH I use the first gdb copy to get the info from the 0-th
> process and then continue to use it as a node debugger, here I'll have to
> use one more gdb to get the process table out of orterun process? And how
> to do this in a safe way?

In the current implementation, yes, you'll need another gdb (you have to
remember where this stuff came from -- TV's view of the world is to ave
"one" master debugger that controls all the processes, so having a
separate "starter" process in addition to the MPI processes was no big
deal). We could do something different, though, such as dump out the
information to a file, or if you're actually integrated in as a
component, then you could get the information directly (i.e., via
API)...? The possibilities here are open.

> > You probably get a "stopped" message when you try to bg orterun because
> > the shell thinks that it is waiting for input from stdin, because we
> > didn't close it.
> Actually this shouldn't matter. Many programs don't close stdin but
> nothing prevents them from running in background until they try to
> read input. The same "Hello world" application runs well with MPICH
> "mpirun -np 3 a.out &"
>
> Best regards,
> Konstantin.
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/