Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Konstantin Karganov (kostik_at_[hidden])
Date: 2005-10-20 13:03:03


> However, we're quite open to other approaches. Because of the nature of
> our integration with a variety of different run-time environments, our
> startup is not a shell script -- mpirun ("orterun" is its real name;
> "mpirun" is a sym link to orterun) is a compiled executable.
Surely, I saw that mpirun is the orterun executable :)
And this means that to add some features I need to rebuild it (and some
run-time libs probably) each time.
 
> What are the requirements of your debugger? Do you attempt to launch
> the MPI processes yourself, or do you attach to them after they are
> launched (which is what TotalView does)?
It is supposed to attach GDB to each process after it has launched, so the
TotalView interface goes well, except that its details are hardcoded in
the source of orte/tools/orterun (as you may guess I don't have the
executable named "totalview", etc.). I'd like to know when and where do
the functions from orterun/totalview.{h,c} get called, do I need to write
my own file like this, etc. In other words, "the debugger adder reference
manual" :)

Currently I launch gdb's on remote processes via ssh (as MPICH does), but
probably it will be better to use orte framework capabilities for this.
Don't know yet how.

In general, are there an ompi/orte architecture description docs, other
than short schemes in your publications? It's too general there and too
detailed in sources and doxygen docs. Some intermediate "how all this
works together" doc is needed to assemble the whole picture...
For me, I do not understand it completely.

> Open MPI uses orterun as its launcher, not the first MPI process.
> Hence, it is the one that TotalView gets it information from (in that
> sense, it's similar to the MPICH model -- there is one coordinator; it's
> just that it's orterun, not the first MPI process). Once orterun
> receives notification that all the MPI processes have started, it gives
> the nodename/PID information of each process to TotalView who then
> launches its own debugger processes on those nodes and attaches to the
> processes.
Hm.. with MPICH I use the first gdb copy to get the info from the 0-th
process and then continue to use it as a node debugger, here I'll have to
use one more gdb to get the process table out of orterun process? And how
to do this in a safe way?

> You probably get a "stopped" message when you try to bg orterun because
> the shell thinks that it is waiting for input from stdin, because we
> didn't close it.
Actually this shouldn't matter. Many programs don't close stdin but
nothing prevents them from running in background until they try to
read input. The same "Hello world" application runs well with MPICH
"mpirun -np 3 a.out &"
 
Best regards,
Konstantin.