Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-06-02 08:53:06


Just curious -- what's difficult about this? SIGTSTP and SIGCONT can be
caught; is there something preventing us from sending "stop" and
"continue" messages (just like we send "die" messages)?
 
(If I had to guess, I think the user is asking because some other MPI
implementations implement this kind of behavior)
 
Thanks!

________________________________

        From: devel-bounces_at_[hidden]
[mailto:devel-bounces_at_[hidden]] On Behalf Of Ralph Castain
        Sent: Thursday, June 01, 2006 10:50 PM
        To: Open MPI Developers
        Subject: Re: [OMPI devel] SIGSTOP and SIGCONT on orted
        
        
        Actually, there were some implementation issues that might
prevent this from working and were the reason we didn't implement it
right away. We don't actually transmit the SIGTERM - we capture it in
mpirun and then propagate our own "die" command to the remote processes
and daemons. Fortunately, "die" is very easy to implement.
        
        Unfortunately, "stop" and "continue" are much harder to
implement from inside of a process. We'll have to look at it, but this
may not really be feasible.
        
        Ralph
        
        
        
        Jeff Squyres (jsquyres) wrote:

                The main reason that it doesn't work is because we
didn't do any thing
                to make it work. :-)
                
                Specifically, mpirun is not intercepting SIGSTOP and
passing it on to
                the remote nodes. There is nothing in the design or
architecture that
                would prevent this, but we just don't do it [yet].
                 
                
                  

                        -----Original Message-----
                        From: devel-bounces_at_[hidden]
                        [mailto:devel-bounces_at_[hidden]] On Behalf Of
Pak Lui
                        Sent: Thursday, June 01, 2006 5:02 PM
                        To: devel_at_[hidden]
                        Subject: [OMPI devel] SIGSTOP and SIGCONT on
orted
                        
                        Hi,
                        
                        I have a question on signals. Normally when I do
a SIGTERM
                        (control-C)
                        on mpirun, the signal seems to get handled in a
way that it
                        broadcasts
                        to the orted and processes on the execution
hosts. However,
                        when I send
                        a SIGSTOP to mpirun, mpirun seems to have
stopped, but the
                        processes of
                        the user executable continue to run. I guess I
could hook up the
                        debugger to mpirun and orted to see why they are
handled differently,
                        but I guess I anxious to hear about it here.
                        
                        I am trying to see the behavior of SIGSTOP and
SIGCONT for the
                        suspension/resumption feature in N1GE. It'll try
to use these
                        signals to
                        stop and continue both mpirun and orted (and its
processes), but the
                        signals (SIGSTOP and SIGCONT) don't seem to get
propagated to
                        the remote
                        orted.
                        
                        I can see there are some issues for implementing
this feature on N1GE
                        because the 'qrsh' interface does not send the
signal to orted on the
                        remote node, but only to 'mpirun'. I am trying
to see how to
                        work around
                        this.
                        
                        --
                        
                        Thanks,
                        
                        - Pak Lui
                        pak.lui_at_[hidden]
                        
                        _______________________________________________
                        devel mailing list
                        devel_at_[hidden]
        
http://www.open-mpi.org/mailman/listinfo.cgi/devel
                        
                            

                
                _______________________________________________
                devel mailing list
                devel_at_[hidden]
                http://www.open-mpi.org/mailman/listinfo.cgi/devel