Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-01-08 15:07:57


On Jan 8, 2007, at 2:52 PM, Grobe, Gary L. ((JSC-EV))[ESCG] wrote:

> I was wondering if someone could send me the HACKING file so I can
> do a
> bit more with debugging on the snapshots. Our web proxy has webdav
> methods turned off (request methods fail) so that I can't get to the
> latest of the svn repos.

Bummer. :-( You are definitely falling victim to the fact that or
nightly snapshots have been less-than-stable recently. Sorry [again]
about that!

FWIW, there's two ways to browse the source in the repository without
an SVN checkout:

- you can just point a normal web browser to our SVN repository (I'm
pretty sure that doesn't use DAV, but I'm not 100% sure...), e.g.:
https://svn.open-mpi.org/svn/ompi/trunk/HACKING

- you can use our Trac SVN browser, e.g.: https://svn.open-mpi.org/
trac/ompi/browser/trunk/HACKING (there's a link at the bottom to
download each file without all the HTML markup).

>> Second thing. From one of your previous emails, I see that MX
>> is configured with 4 instance by node. Your running with
>> exactly 4 processes on the first 2 nodes. Weirds things might
>> happens ...
>
> Just curious about this comment. Are you referring to over
> subscribing?
> We run 4 processes on each node because we have 2 dual core cpu's on
> each node. Am I not understanding processor counts correctly?

I'll have to defer to Reese on this one...

>> PS: Is there any way you can attach to the processes with gdb
>> ? I would like to see the backtrace as showed by gdb in order
>> to be able to figure out what's wrong there.
>
> When I can get more detailed dbg, I'll send. Though I'm not clear on
> what executable is being searched for below.
>
> $ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x
> LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca
> pml cm
> --mca mtl mx ./cpi

FWIW, note that "-dbg" is not a recognized Open MPI mpirun command
line switch -- after all the debugging information, Open MPI finally
gets to telling you:

> ----------------------------------------------------------------------
> --
> Failed to find the following executable:
>
> Host: juggernaut
> Executable: -b
>
> Cannot continue.
> ----------------------------------------------------------------------
> --

So nothing actually ran in this instance.

Our debugging entries on the FAQ (http://www.open-mpi.org/faq/?
category=debugging) are fairly inadequate at the moment, but if
you're running in an ssh environment, you generally have 2 choices to
attach serial debuggers:

1. Put a loop in your app that pauses until you can attach a
debugger. Perhaps something like this:

{ int i = 0; printf("pid %d ready\n", getpid()); while (0 == i) sleep
(5); }

Kludgey and horrible, but it works.

2. mpirun an xterm with gdb. You'll need to specifically use the -d
option to mpirun in order to keep the ssh sessions alive to relay
back your X information, or separately setup your X channels yourself
(e.g., if you're on a closed network, it may be acceptable to "xhost
+" the nodes that you're running on and just manually setup the
DISPLAY variable for the target nodes, perhaps via the -x option to
mpirun) -- in which case you would not need to use the -d option to
mpirun.

Make sense?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems