Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] debugger confusion
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-11-07 19:59:44

I was trying to understand how the debugger interface is supposed to work. And if I was confused before, that feeling never disappeared.

There is one thing that I really can't figure out, and I hope that somebody (Jeff/Ralph/Rolf based on svn blame) can enlighten me.

MPIR_debug_gate. In the document accepted by the MPI Forum we have the following definition:

> MPIR_debug_gate is an integer variable that is set to 1 by the tool to notify the MPI
> processes that the debugger has attached. An MPI process may use this variable as a
> synchronization mechanism to prevent it from running away before the tool has time to
> attach to the process.
> An MPI implementation is not required to use the MPIR_debug_gate variable for synchronization. However, the MPI job control runtime system must prevent the created MPI
> processes from running beyond the return from the applications call to MPI_INIT.

In case it is not clear enough, in the section describing the startup process, we can find the following clarification:

> If the symbol MPIR_partial_attach_ok is defined in the starter process, then this
> informs the tool that the initial startup barrier is implemented by the MPI system,
> and it is not necessary to set the MPIR_debug_gate variable in any of MPI processes.
> However, if the symbol MPIR_partial_attach_ok is not defined in the starter process,
> the tool must attach and set the MPIR_debug_gate variable to 1 in each MPI processes
> to release them from the gate, even if the tool user has instructed the tool to not attach
> to all of the MPI processes.

A started process is defined as being our mpirun. In Open MPI MPIR_partial_attach_ok is defined, so the tool will suppose that we provide a means to synchronize the processes not based on MPIR_debug_gate. Therefore only one behavior if acceptable based on the text above: no MPIR_debug_gate=1 should be issued by the tool.

However, in the ompi_debuggers.c around line 226, we have an if that switch between the two acceptable behavior (MPIR_debug_gate or own mechanism) based on the fact that we are a standalone (slurmd or generic) or not. As generic is the ess loaded in most of the cases, I can't figure out how this works if the MPIR specification document has to be trusted.