Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-06-16 20:00:57

I'm afraid that I'm not familiar with the PG debugger, so I don't know
how it is supposed to be launched.

The intent with --debugger / --debug is that you could do a single
invocation of some command and it launches both the parallel debugger
and tells that debugger to launch your parallel MPI process (assumedly
allowing the parallel debugger to attach to your parallel MPI process).
This is what fx2 and Totalview allow, for example.

As such, the "--debug" option is simply syntactic sugar for invoking
another [perhaps non-obvious] command. We figured it was simpler for
users to add "--debug" to the already-familiar mpirun command line than
to learn a new syntax for invoking a debugger (although both would
certainly work equally well).

As such, when OMPI's mpirun sees "--debug", it ends up exec'ing
something else -- the parallel debugger command. In the example that I
gave in,
mpirun looked for two things in your path: totalview and fx2.

For example, if you did this:

        mpirun --debug -np 4 a.out

If it found totalview, it would end up exec'ing:

        totalview @mpirun@ -a @mpirun_args@
which would get substituted to
        totalview mpirun -a -np 4 a.out

(note the additional "-a") Which is the totalview command line syntax to
launch their debugger and tell it to launch your parallel process. If
totalview is not found in your path, it'll look for fx2. If fx2 is
found, it'll invoke:

        fx2 @mpirun@ -a @mpirun_args@
which would get substitued to
        fx2 mpirun -a -np 4 a.out

You can see that fx2's syntax was probably influenced by totalview's.

So what you need is the command line that tells pgdbg to do the same
thing -- launch your app and attach to it. You can then substitute that
into the "--debugger" option (using the @mpirun@ and @mpirun_args@
tokens), or set the MCA parameter "orte_base_user_debugger", and then
use --debug. For example, if the pgdbg syntax is similar to that of
totalview and fx2, then you could do the following:

        mpirun --debugger pgdbg @mpirun@ -a @mpirun_args@ --debug -np 4
or (assuming tcsh)
        shell% setenv OMPI_MCA_orte_base_user_debugger "pgdbg @mpirun@
-a @mpirun_args@"
        shell% mpirun --debug -np 4 a.out

Make sense?

If you find a fixed format for pgdb, we'd be happy to add it to the
default value of the orte_base_user_debugger MCA parameter.

Note that OMPI currently only supports the Totalview API for attaching
to MPI processes -- I don't know if pgdbg requires something else.

> -----Original Message-----
> From: users-bounces_at_[hidden]
> [mailto:users-bounces_at_[hidden]] On Behalf Of Caird, Andrew J
> Sent: Tuesday, June 13, 2006 4:38 PM
> To: users_at_[hidden]
> Subject: [OMPI users] OpenMPI, debugging, and Portland Group's pgdbg
> Hello all,
> I've read the thread "OpenMPI debugging support"
> (
> p) and it
> looks like there is improved debugging support for debuggers
> other than
> TV in the 1.1 series.
> I'd like to use Portland Groups pgdbg. It's a parallel debugger,
> there's more information at
> >From the previous thread on this topic, it looks to me like
> the plan for
> 1.1 and forward is to support the ability to launch the
> debugger "along
> side" the application. I don't know enough about either pgdbg or
> OpenMPI to know if this is the best plan, but assuming that it is, is
> there a way to see if it is happening?
> I've tried this two ways, the first way doesn't seem to attach to
> anything:
> --------------------------------------------------------------
> ----------
> ----
> [acaird_at_nyx-login ~]$ ompi_info | head -2
> Open MPI: 1.1a9r10177
> Open MPI SVN revision: r10177
> [acaird_at_nyx-login ~]$ mpirun --debugger pgdbg --debug -np 2 cpi
> PGDBG 6.1-3 x86-64 (Cluster, 64 CPU)
> Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
> Copyright 2000-2005, STMicroelectronics, Inc. All Rights Reserved.
> PGDBG cannot open a window; check the DISPLAY environment variable.
> Entering text mode.
> pgdbg> list
> ERROR: No current thread.
> pgdbg> quit
> --------------------------------------------------------------
> ----------
> ----
> and I've tried running the whole thing under pgdbg:
> --------------------------------------------------------------
> ----------
> ----
> [acaird_at_nyx-login ~]$ pgdbg mpirun -np 2 cpi -s pgdbgscript
> { lots of mca_* loaded by ld-linux messages }
> pgserv 8726: attach : attach 8720 fails
> ERROR: New Process (PID 8720, HOST localhost) ATTACH FAILED.
> ERROR: New Process (PID 8720, HOST localhost) IGNORED.
> ERROR: cannot read value at address 0x59BFE8.
> ERROR: cannot read value at address 0x59BFF0.
> ERROR: cannot read value at address 0x59BFF8.
> ERROR: New Process (PID 0, HOST unknown) IGNORED.
> ERROR: cannot read value at address 0x2A959BBEC8.
> --------------------------------------------------------------
> ----------
> ----
> and it hangs right there until I kill it. The two variables in this
> scenario are:
> PGRSH=ssh and the contents of pgdbgscript are:
> --------------------------------------------------------------
> ----------
> ----
> pgienv exe force
> pgienv mode process
> ignore 12
> run
> --------------------------------------------------------------
> ----------
> ----
> So, the short list of questions are:
> 1. Has anyone done this successfully before?
> 2. Am I making the right assumptions about how the debugger
> attaches to
> the processes?
> 3. Is this the expected behavior for this set of options to mpirun?
> 4. Does anyone have any suggestions for other things I might try?
> Thanks a lot.
> --andy
> _______________________________________________
> users mailing list
> users_at_[hidden]