Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] debugger confusion
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-11-07 20:34:23

I can't speak to what is in ompi_debuggers.c as I believe Jeff wrote most of that. However, what is there has been tested and works with TotalView and a couple of other debuggers.

Best guess: from what I've seen, most debuggers don't seem to conform to what the MPI Forum has "accepted". It doesn't appear that the vendors and debugger developers pay too much attention to that document, possibly because it (a) came after the debuggers were developed, and (b) still doesn't seem to be widely adopted.

I'd suggest being a little careful about making changes without consulting people who use TV and "stat", at least - those are the ones most recently tested.

On Nov 7, 2011, at 5:59 PM, George Bosilca wrote:

> I was trying to understand how the debugger interface is supposed to work. And if I was confused before, that feeling never disappeared.
> There is one thing that I really can't figure out, and I hope that somebody (Jeff/Ralph/Rolf based on svn blame) can enlighten me.
> MPIR_debug_gate. In the document accepted by the MPI Forum we have the following definition:
>> MPIR_debug_gate is an integer variable that is set to 1 by the tool to notify the MPI
>> processes that the debugger has attached. An MPI process may use this variable as a
>> synchronization mechanism to prevent it from running away before the tool has time to
>> attach to the process.
>> An MPI implementation is not required to use the MPIR_debug_gate variable for synchronization. However, the MPI job control runtime system must prevent the created MPI
>> processes from running beyond the return from the applications call to MPI_INIT.
> In case it is not clear enough, in the section describing the startup process, we can find the following clarification:
>> If the symbol MPIR_partial_attach_ok is defined in the starter process, then this
>> informs the tool that the initial startup barrier is implemented by the MPI system,
>> and it is not necessary to set the MPIR_debug_gate variable in any of MPI processes.
>> However, if the symbol MPIR_partial_attach_ok is not defined in the starter process,
>> the tool must attach and set the MPIR_debug_gate variable to 1 in each MPI processes
>> to release them from the gate, even if the tool user has instructed the tool to not attach
>> to all of the MPI processes.
> A started process is defined as being our mpirun. In Open MPI MPIR_partial_attach_ok is defined, so the tool will suppose that we provide a means to synchronize the processes not based on MPIR_debug_gate. Therefore only one behavior if acceptable based on the text above: no MPIR_debug_gate=1 should be issued by the tool.
> However, in the ompi_debuggers.c around line 226, we have an if that switch between the two acceptable behavior (MPIR_debug_gate or own mechanism) based on the fact that we are a standalone (slurmd or generic) or not. As generic is the ess loaded in most of the cases, I can't figure out how this works if the MPIR specification document has to be trusted.
> george.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]