Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted
From: Reuti (reuti_at_[hidden])
Date: 2011-04-23 08:20:34


Hi,

Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:

> I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue. However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
>
> The problem is that this doesn't happen if I run the command directly instead, without mpirun. Attached is a script that reproduces the problem. It runs a simple counting script in the background which takes 10 seconds to run, and tails the output. If called with "nompi" as first argument, it will simply run bash -c "$SCRIPT" >& "$out" &, and with "mpi" it will do the same with 'mpirun -np 1' prepended. The output I get is:

what about:

( trap "" sigint; exec mpiexec ...) &

i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status

-- Reuti

>
> $ ./ompi_bug.sh mpi
> mpi:
> 1
> 2
> 3
> 4
> ^C
> $ ./ompi_bug.sh nompi
> nompi:
> 1
> 2
> 3
> 4
> ^C
> $ cat output.*
> mpi:
> 1
> 2
> 3
> 4
> mpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
> nompi:
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> Done
>
>
> This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
>
> I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
>
> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
>
> Thanks,
> Pablo
> <ompi_bug.sh.gz>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users