Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] debugger confusion
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-11-07 19:59:44

I was trying to understand how the debugger interface is supposed to work. And if I was confused before, that feeling never disappeared.

There is one thing that I really can't figure out, and I hope that somebody (Jeff/Ralph/Rolf based on svn blame) can enlighten me.

MPIR_debug_gate. In the document accepted by the MPI Forum we have the following definition:

> MPIR_debug_gate is an integer variable that is set to 1 by the tool to notify the MPI
> processes that the debugger has attached. An MPI process may use this variable as a
> synchronization mechanism to prevent it from running away before the tool has time to
> attach to the process.
> An MPI implementation is not required to use the MPIR_debug_gate variable for synchronization. However, the MPI job control runtime system must prevent the created MPI
> processes from running beyond the return from the applications call to MPI_INIT.

In case it is not clear enough, in the section describing the startup process, we can find the following clarification:

> If the symbol MPIR_partial_attach_ok is defined in the starter process, then this
> informs the tool that the initial startup barrier is implemented by the MPI system,
> and it is not necessary to set the MPIR_debug_gate variable in any of MPI processes.
> However, if the symbol MPIR_partial_attach_ok is not defined in the starter process,
> the tool must attach and set the MPIR_debug_gate variable to 1 in each MPI processes
> to release them from the gate, even if the tool user has instructed the tool to not attach
> to all of the MPI processes.

A started process is defined as being our mpirun. In Open MPI MPIR_partial_attach_ok is defined, so the tool will suppose that we provide a means to synchronize the processes not based on MPIR_debug_gate. Therefore only one behavior if acceptable based on the text above: no MPIR_debug_gate=1 should be issued by the tool.

However, in the ompi_debuggers.c around line 226, we have an if that switch between the two acceptable behavior (MPIR_debug_gate or own mechanism) based on the fact that we are a standalone (slurmd or generic) or not. As generic is the ess loaded in most of the cases, I can't figure out how this works if the MPIR specification document has to be trusted.