Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Olesen, Mark (Mark.Olesen_at_[hidden])
Date: 2007-03-13 11:04:18

Hi Reuti (and others),

> And now the odd thing: the jobscript (with the mpirun) is gone on the
> head node of this parallel job, but all the spawned qrsh processes
> are still there:

I'm glad that someone else can almost reproduce my problem.
On the suspicion that my application was not ignoring usr1/usr2, I added a
signal handler that simply outputs "ignoring SIGUSR*". The shell script now

trap 'echo script usr1' USR1
trap 'echo script usr2' USR2

> So in the SGE case: usr1 should be caught by the mpirun (and not
> terminate it), which will notify the daemons to stop each ones child
> processes. This would simulate a real suspend, performed by OpenMPI.

Using qmod -sj to suspend the job (sending the usr1 warning signal), I have
the same behaviour as before. Interestingly enough, I get two messages:

    mpirun: Forwarding signal 10 to job
    The daemon received a signal 10.

After these messages, only the sge-shepherd and mpirun are alive - the job
and qrsh processes are gone. Some time later, the following message also

    mpirun: Forwarding signal 12 to job

after which, no processes are left, *except* the mpirun, which I need to
kill by hand.

In case the configuration is a factor, the cluster machines are running with
a stock SuSE 9.2 (Linux 2.6.8-24-smp and/or 2.6.8-24.16-smp).

The openmpi configuration:
            ./configure \
                --prefix=$OPENMPI_ARCH_PATH \
                --enable-shared \
                --disable-static \
                --disable-mpi-f77 \
                --disable-mpi-f90 \
                --disable-mpi-profile \


This e-mail message and any attachments may contain legally privileged, confidential or proprietary Information, or information otherwise protected by law of ArvinMeritor, Inc., its affiliates, or third parties. This notice serves as marking of its “Confidential” status as defined in any confidentiality agreements concerning the sender and recipient. If you are not the intended recipient(s), or the employee or agent responsible for delivery of this message to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this e-mail message is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete this e-mail message from your computer.