Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] [MTT svn] GIT: MTT branch master updated. 016088f2a0831b32ab5fd6f60f4cabe67e92e594
From: Mike Dubman (miked_at_[hidden])
Date: 2014-06-23 07:47:38


after patch, it killed child processes but kept mpirun ... itself.

before that patch - all processes were killed (and you are right, "mpirun
died right at the end of the timeout" was reported) but at least it left
the cluster in the clean state w/o leftovers.
now many "orphan" launchers are alive from previous invocations.

On Mon, Jun 23, 2014 at 2:18 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]
> wrote:

> There was actually quite a bit of testing before this was committed. This
> commit resolved a lot of hangs across multiple organizations.
>
> Can you be more specific as to what is happening?
>
> The prior code was killing child processes before mpirun itself, for
> example, which has led MTT to erroneously report that mpirun died right at
> the end of the timeout without being killed. This has been ongoing for
> many months, at a minimum.
>
>
>
>
> On Jun 23, 2014, at 4:37 AM, Mike Dubman <miked_at_[hidden]> wrote:
>
> > this commit does more harm then good.
> > we experience following:
> >
> > - some child processes still running after timeout and mtt killed the
> job.
> >
> > before this commit - it worked fine.
> > please revert and test more.
> >
> >
> >
> > On Sat, Jun 21, 2014 at 3:30 PM, MPI Team <mpiteam_at_[hidden]> wrote:
> > The branch, master has been updated
> > via 016088f2a0831b32ab5fd6f60f4cabe67e92e594 (commit)
> > via 7fb4c6a4c9d71be127ea53bd463178510577f71f (commit)
> > via 381ba177d835a54c3197d846f5a4edfc314efe27 (commit)
> > via cfdd29de2012eeb7592706f00dd07a52dd48cf6b (commit)
> > via 940030ca20eb1eaf256e898b83866c1cb83aca5c (commit)
> > from c99ed7c7b159a2cab58a251bd7c0dad8972ff901 (commit)
> >
> > Those revisions listed above that are new to this repository have
> > not appeared on any other notification email; so we list those
> > revisions in full, below.
> >
> > - Log -----------------------------------------------------------------
> >
> https://github.com/open-mpi/mtt/commit/016088f2a0831b32ab5fd6f60f4cabe67e92e594
> >
> > commit 016088f2a0831b32ab5fd6f60f4cabe67e92e594
> > Author: Jeff Squyres <jsquyres_at_[hidden]>
> > Date: Sat Jun 21 04:58:45 2014 -0700
> >
> > DoCommand: several fixes to kill_proc logic
> >
> > 1. Fix the kill(0, $pid) test to see if the process was still alive.
> >
> > 2. Rename _kill_proc() to _kill_proc_tree() to indicate that it's
> > really killing not only the PID in question, but also all of its
> > descendants.
> >
> > 3. In _kill_proc_tree(), change the order to kill the main PID first,
> > and ''then'' kill all the descendants.
> >
> > The main use case is when killing mpirun: if we kill mpirun's
> > descendants ''first'', mpirun will detect its childrens' deaths and
> > then cleanup and exit. Later, when MTT finally gets around to
> killing
> > mpirun, MTT will detect that mpirun is already dead and therefore
> emit
> > a confusing "mpirun died right at end of timeout" message. This is
> > misleading at best; it doesn't indicate what actually happened.
> >
> > However, if we kill mpirun first, it will take care of killing all of
> > its descendants. MTT will therefore emit the right messages about
> > killing mpirun. MTT will then redundantly try to kill a bunch of
> > now-nonexistent descendant processes of mpirun, but that's ok/safe.
> > We actually ''want'' this try-to-kill-mpirun's-descendants behavior
> to
> > handle the case when mpirun is misbehaving / not cleaning up its
> > descendants.
> >
> > 4. DoCommand() is used for more than launching mpirun, so pass down
> > $argv0 so that we can print the actual command name that is being
> > killed in various Verbose/Debug messages, not the hard-coded "mpirun"
> > string (which, in practice, was probably almost always correct, but
> > still...).
> > ---
> > lib/MTT/DoCommand.pm | 78
> ++++++++++++++++++++++++++++++++++++----------------
> > 1 file changed, 55 insertions(+), 23 deletions(-)
> >
> > diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> > index 02cdb94..646ca31 100644
> > --- a/lib/MTT/DoCommand.pm
> > +++ b/lib/MTT/DoCommand.pm
> > @@ -2,7 +2,7 @@
> > #
> > # Copyright (c) 2005-2006 The Trustees of Indiana University.
> > # All rights reserved.
> > -# Copyright (c) 2006-2013 Cisco Systems, Inc. All rights reserved.
> > +# Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved.
> > # Copyright (c) 2007-2008 Sun Microsystems, Inc. All rights reserved.
> > # Copyright (c) 2007-2012 High Performance Computing Center Stuttgart,
> > # University of Stuttgart. All rights reserved.
> > @@ -57,23 +57,27 @@ sub DoCommand {
> > ($time_arg, $no_execute) = @_;
> > }
> >
> > +# This function is called for killing both mpirun and each of its
> > +# descendants. We really only want to see verbose output for when we
> > +# kill mpirun itself, so only show output when the caller provides us
> > +# with a $argv0 value.
> > sub _kill_proc_one {
> > - my ($pid) = @_;
> > + my ($pid, $argv0) = @_;
> >
> > # How long to wait after each kill()
> > my $wait_time = 5;
> >
> > # See if the proc is alive first
> > - my $kid;
> > - kill(0, $pid);
> > - $kid = waitpid($pid, WNOHANG);
> > - return "mpirun died right at end of timeout (MTT did not have to
> kill it)"
> > - if (-1 == $kid);
> > + my $num_alive = kill(0, $pid);
> > + return "$argv0 died right at end of timeout (MTT did not have to
> kill it)"
> > + if (0 == $num_alive);
> >
> > # Try an easy kill
> > kill("TERM", $pid);
> > - Verbose("*** Killing mpirun with SIGTERM\n");
> > + Verbose("*** Killing $argv0 with SIGTERM\n")
> > + if (defined($argv0));
> > # Give mpirun some time to cleanup before we try to reap it.
> > + my $kid;
> > my $i = $wait_time;
> > while ($i > 0) {
> > sleep(1);
> > @@ -85,45 +89,74 @@ sub _kill_proc_one {
> > # process no longer exists (i.e., we get -1 back from
> > # waitpid), or we successfully killed it (i.e., we got the PID
> > # back from waitpid).
> > - return "MTT killed mpirun via SIGTERM" if (0 != $kid);
> > + return "MTT killed $argv0 via SIGTERM" if (0 != $kid);
> >
> > --$i;
> > }
> > - Verbose("** Kill TERM (after $wait_time seconds) didn't work!\n");
> > + Verbose("** Kill TERM (after $wait_time seconds) didn't work!\n")
> > + if (defined($argv0));
> >
> > # That didn't work either. Try SIGINT;
> > - Verbose("*** Killing mpirun with SIGINT\n");
> > + Verbose("*** Killing $argv0 with SIGINT\n")
> > + if (defined($argv0));
> > kill("INT", $pid);
> > my $i = $wait_time;
> > while ($i > 0) {
> > sleep(1);
> > $kid = waitpid($pid, WNOHANG);
> > - return "MTT killed mpirun via SIGINT" if (0 != $kid);
> > + return "MTT killed $argv0 via SIGINT" if (0 != $kid);
> > --$i;
> > }
> > - Verbose("** Kill INT (after $wait_time seconds) didn't work!\n");
> > + Verbose("** Kill INT (after $wait_time seconds) didn't work!\n")
> > + if (defined($argv0));
> >
> > # Ok, now we're mad. Be violent.
> > - Verbose("*** Now I'm mad. Killing mpirun with SIGKILL\n");
> > + Verbose("*** Now I'm mad. Killing $argv0 with SIGKILL\n")
> > + if (defined($argv0));
> > my $count = 0;
> > while (1) {
> > kill("KILL", $pid);
> > ++$count;
> > $kid = waitpid($pid, WNOHANG);
> > - return "MTT killed mpirun via $count SIGKILLs" if (0 != $kid);
> > - Verbose("** Kill KILL didn't work! Sleeping and trying
> again...\n");
> > + return "MTT killed $argv0 via $count SIGKILLs" if (0 != $kid);
> > + Verbose("** Kill KILL didn't work! Sleeping and trying
> again...\n")
> > + if (defined($argv0));
> > sleep(1);
> > }
> > }
> >
> >
> > -sub _kill_proc {
> > - my ($pid) = @_;
> > - # kill the group, take the names later
> > - foreach my $process (descendant_processes($pid)) {
> > +sub _kill_proc_tree {
> > + my ($pid, $argv0) = @_;
> > +
> > + # Find all descendent processes of the main PID
> > + my @children = descendant_processes($pid);
> > +
> > + # Kill the main PID. Note that _kill_proc_one() will give the
> > + # process time to react/cleanup, so there's no need for an
> > + # additional delay after it returns.
> > + my $ret = _kill_proc_one($pid, $argv0);
> > +
> > + # Now kill all the processes that descended from the base PID.
> > + #
> > + # Note: when mpirun is working properly (which is one of the
> > + # biggest use cases for DoCommand), this is redundant -- all the
> > + # children will already be dead (because killing mpirun will
> > + # ensure that all descendant processes are also killed).
> > + #
> > + # That being said, a) DoCommand() is used to launch more than just
> > + # mpirun, b) mpirun breaks sometimes and doesn't clean up its
> > + # descendants, and c) it's safe to call _kill_proc_one() on a PID
> > + # that is already dead.
> > + foreach my $process (@children) {
> > + Debug("=== killing child proc: $process->{pid},
> $process->{argv0}\n");
> > +
> > + # Ignore the return. For example, we don't care if the child
> > + # is already dead
> > _kill_proc_one($process->{pid});
> > }
> > - return _kill_proc_one($pid);
> > +
> > + return $ret;
> > }
> >
> >
> #--------------------------------------------------------------------------
> > @@ -338,7 +371,6 @@ sub Cmd {
> >
> > my $tokens = _quote_escape($cmd);
> >
> > -
> > my $pid;
> > if (! $no_execute) {
> >
> > @@ -568,7 +600,7 @@ sub Cmd {
> >
> > $done = 0;
> > }
> > - $timeout_message = _kill_proc($pid);
> > + $timeout_message = _kill_proc_tree($pid, ${$tokens}[0]);
> >
> > # We don't care about the exit status if we timed out
> > # -- fill it with a bogus value.
> >
> >
> https://github.com/open-mpi/mtt/commit/7fb4c6a4c9d71be127ea53bd463178510577f71f
> >
> > commit 7fb4c6a4c9d71be127ea53bd463178510577f71f
> > Author: Jeff Squyres <jsquyres_at_[hidden]>
> > Date: Sat Jun 21 04:42:19 2014 -0700
> >
> > DoCommand: replace obscure perl code with easier-to-grok version
> >
> > The descendant_processes() and flatten() codes were written in a
> > nicely efficient yet very difficult to understand manner. This
> commit
> > replaces these two subs with admittedly less efficient but much
> easier
> > to understand code.
> >
> > The new code also saves the first token of the command for each PID
> so
> > that it can be used in Debug/Verbose statements (e.g., identify the
> > command that is being killed, etc.).
> > ---
> > lib/MTT/DoCommand.pm | 63
> ++++++++++++++++++++++++++++++++++++++++------------
> > 1 file changed, 49 insertions(+), 14 deletions(-)
> >
> > diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> > index c3be8a8..02cdb94 100644
> > --- a/lib/MTT/DoCommand.pm
> > +++ b/lib/MTT/DoCommand.pm
> > @@ -120,8 +120,8 @@ sub _kill_proc_one {
> > sub _kill_proc {
> > my ($pid) = @_;
> > # kill the group, take the names later
> > - foreach my $p (descendant_processes($pid)) {
> > - _kill_proc_one($p);
> > + foreach my $process (descendant_processes($pid)) {
> > + _kill_proc_one($process->{pid});
> > }
> > return _kill_proc_one($pid);
> > }
> > @@ -527,7 +527,8 @@ sub Cmd {
> > if (defined($timeout_backtrace_program) and
> !$got_backtrace) {
> > $backtrace = "";
> > if ( $timeout_before_backtrace_program ) {
> > - foreach my $p (descendant_processes($pid)) {
> > + foreach my $process
> (descendant_processes($pid)) {
> > + my $p = $process->{pid};
> > my $c = $timeout_before_backtrace_program;
> > $c =~ s/%PID%/$p/g;
> > Debug("_pre_backtrace cmd: $c\n");
> > @@ -538,7 +539,8 @@ sub Cmd {
> > $backtrace .=
> _get_backtrace($timeout_backtrace_program, $pid, $pre_pernode);
> >
> > if ( $timeout_after_backtrace_program ) {
> > - foreach my $p (descendant_processes($pid)) {
> > + foreach my $process
> (descendant_processes($pid)) {
> > + my $p = $process->{pid};
> > my $c = $timeout_after_backtrace_program;
> > $c =~ s/%PID%/$p/g;
> > Debug("_post_backtrace cmd: $c\n");
> > @@ -738,19 +740,50 @@ sub _do_email_timeout_notification {
> > close(TIMEOUT_SENTINEL_FILE);
> > }
> >
> > -sub flatten {
> > - map{ (ref($_) eq "ARRAY") ? map{flatten($_)}@$_ : $_ } @_;
> > +sub find_children {
> > + my ($pids, $start) = @_;
> > +
> > + my @ret;
> > + # Scan the PID list and see if any other PIDs list $start as their
> > + # PPID. If so, save that PID and then recursively look for *that*
> > + # PID's children.
> > + foreach my $pid (sort(keys(%{$pids}))) {
> > + if ($pids->{$pid}->{ppid} == $start) {
> > + push(@ret, {
> > + pid => $pid,
> > + argv0 => $pids->{$pid}->{argv0},
> > + });
> > +
> > + # Add this PID's descendants to the return array
> > + push(@ret, find_children($pids, $pid));
> > + }
> > + }
> > +
> > + return @ret;
> > }
> >
> > sub descendant_processes {
> > my ($base) = (@_, $$);
> > - my %parentage = (`ps -eo pid,ppid` =~ /\d+/g);
> > - my %reverse = map { ($_, [$_]) } %parentage;
> > - while (my ($pid,$ppid) = each %parentage){
> > - push @{$reverse{$ppid}}, $reverse{$pid};
> > + open(PS, "ps -eo pid,ppid,cmd|") || die "Can't open ps";
> > +
> > + # Skip the title line
> > + <PS>;
> > +
> > + # Read all the data lines
> > + my $pids;
> > + while (<PS>) {
> > + # Grab the PID, PPID, and first token of the command
> > + $_ =~ m/(\d+)\s+(\d+)\s+(\S+)/;
> > + $pids->{$1} = {
> > + pid => $1,
> > + ppid => $2,
> > + argv0 => $3,
> > + };
> > }
> > - shift @{$reverse{$base}};
> > - flatten($reverse{$base});
> > + close(PS);
> > +
> > + # Find all the descendants of the $base PID
> > + return find_children($pids, $base);
> > }
> >
> > sub _get_backtrace {
> > @@ -774,7 +807,8 @@ sub _get_backtrace {
> >
> > # Use ps -Af output to fetch the child pids,
> > # and grab a stack trace from each one
> > - foreach my $p (descendant_processes($pid)) {
> > + foreach my $process (descendant_processes($pid)) {
> > + my $p = $process->{pid};
> > $gdb_cmd = "gdb - $p --command=$gdb_command_filename
> --batch";
> > $ret .= "\n $gdb_cmd";
> > $ret .= "\n" . `$gdb_cmd`;
> > @@ -814,8 +848,9 @@ sub _get_backtrace {
> > }
> > }
> > Debug("Stacktrace: base name $return_basename\n");
> > - #foreach my $p (descendant_processes($pid))
> > + #foreach my $process (descendant_processes($pid))
> > #{
> > + # my $p = $process->{pid};
> > # my $gstack_cmd = "gstack $p";
> > # $ret .= "\n $gstack_cmd";
> > # $ret .= "\n" . `$gstack_cmd`;
> >
> >
> https://github.com/open-mpi/mtt/commit/381ba177d835a54c3197d846f5a4edfc314efe27
> >
> > commit 381ba177d835a54c3197d846f5a4edfc314efe27
> > Author: Jeff Squyres <jsquyres_at_[hidden]>
> > Date: Sat Jun 21 03:48:55 2014 -0700
> >
> > DoCommand: ensure to only kill the process once
> >
> > When the DoCommand timeout fires, MTT will kill the entire process
> > tree of the launched command. However, the stdout/stderr sockets may
> > still be draining, and therefore the main DoCommand loop may still
> > execute more iterations. In cases like these, ensure that we don't
> > try to backtrace/kill the process tree again.
> >
> > Also add a few more Debug statements for detailed output.
> > ---
> > lib/MTT/DoCommand.pm | 23 ++++++++++++++++-------
> > 1 file changed, 16 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> > index 0e253f0..c3be8a8 100644
> > --- a/lib/MTT/DoCommand.pm
> > +++ b/lib/MTT/DoCommand.pm
> > @@ -471,6 +471,7 @@ sub Cmd {
> > my $len = sysread(OUTread, $data, 99999);
> > if (0 == $len) {
> > vec($rin, fileno(OUTread), 1) = 0;
> > + Debug("*** Child process stdout closed\n");
> > --$done;
> > } else {
> > _append($data, $print_timestamp ? localtime() : "",
> > @@ -485,6 +486,7 @@ sub Cmd {
> > my $len = sysread(ERRread, $data, 99999);
> > if (0 == $len) {
> > vec($rin, fileno(ERRread), 1) = 0;
> > + Debug("*** Child process stderr closed\n");
> > --$done;
> > } else {
> > _append($data, $print_timestamp ? localtime() : "",
> > @@ -493,11 +495,19 @@ sub Cmd {
> > }
> > }
> >
> > - # If we're running with a timeout, bail if we're past the end
> > - # time
> > - if (defined($end_time) && time() > $end_time) {
> > + # If we're running with a timeout, check to see if a) the
> > + # process is still running (i.e., stdout/stderr is still
> > + # open), and b) we're past the end time.
> > + if ($done > 0 && defined($end_time) && time() > $end_time) {
> > my $over = time() - $end_time;
> > - if ($over > $last_over) {
> > +
> > + # Note that we only want to backtrace/kill the process
> > + # *once*. Consider: it is possible that we kill the
> > + # process tree, but then stdout/stderr are still draining,
> > + # and therefore we loop around here again *even though the
> > + # process tree is already dead*. So put a little
> > + # do-this-only-once protection in here.
> > + if (!defined($timeout_message) && $over > $last_over) {
> > Verbose("*** Past timeout of $timeout seconds by $over
> seconds\n");
> >
> > # Handle timeout file
> > @@ -555,10 +565,8 @@ sub Cmd {
> > );
> >
> > $done = 0;
> > - $timeout_message = _kill_proc($pid);
> > - } else {
> > - $timeout_message = _kill_proc($pid);
> > }
> > + $timeout_message = _kill_proc($pid);
> >
> > # We don't care about the exit status if we timed out
> > # -- fill it with a bogus value.
> > @@ -576,6 +584,7 @@ sub Cmd {
> > }
> > }
> > }
> > + Debug("*** Child process now dead\n");
> > close OUTerr;
> > close OUTread
> > if (!$merge_output);
> >
> >
> https://github.com/open-mpi/mtt/commit/cfdd29de2012eeb7592706f00dd07a52dd48cf6b
> >
> > commit cfdd29de2012eeb7592706f00dd07a52dd48cf6b
> > Author: Jeff Squyres <jsquyres_at_[hidden]>
> > Date: Sat Jun 21 03:36:08 2014 -0700
> >
> > DoCommand: remove unused variables
> > ---
> > lib/MTT/DoCommand.pm | 2 --
> > 1 file changed, 2 deletions(-)
> >
> > diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> > index 0740b28..0e253f0 100644
> > --- a/lib/MTT/DoCommand.pm
> > +++ b/lib/MTT/DoCommand.pm
> > @@ -445,7 +445,6 @@ sub Cmd {
> > $end_time = time() + $timeout;
> > Debug("Timeout: $timeout - $end_time (vs. now: " . time() .
> ")\n");
> > }
> > - my $killed_status = undef;
> > my $last_over = 0;
> > my $partial_out;
> > my $partial_err;
> > @@ -900,7 +899,6 @@ sub Win_Cmd {
> > $end_time = time() + $timeout;
> > Debug("Timeout: $timeout - $end_time (vs. now: " . time() .
> ")\n");
> > }
> > - my $killed_status = undef;
> > my $last_over = 0;
> >
> > while(<OUTread>) {
> >
> >
> https://github.com/open-mpi/mtt/commit/940030ca20eb1eaf256e898b83866c1cb83aca5c
> >
> > commit 940030ca20eb1eaf256e898b83866c1cb83aca5c
> > Author: Jeff Squyres <jsquyres_at_[hidden]>
> > Date: Fri Jun 20 17:13:08 2014 -0700
> >
> > Diskfree.pm: fix df handle leak
> >
> > This left a zombied "df" process in the process table. The Perl
> > runtime eventually cleaned it up, but it was weird to see a zombied
> df
> > in ps listings.
> >
> > There's no point in reporting this bug upstream; according to the
> > Changes file, v0.06 was published in 1998. There's been no activity
> > on http://search.cpan.org/dist/Filesys-DiskFree/DiskFree.pm since
> > v0.06.
> > ---
> > lib/Filesys/DiskFree.pm | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/Filesys/DiskFree.pm b/lib/Filesys/DiskFree.pm
> > index 1fa655d..41a27d7 100644
> > --- a/lib/Filesys/DiskFree.pm
> > +++ b/lib/Filesys/DiskFree.pm
> > @@ -124,8 +124,9 @@ sub df(){
> > $cmd=$self->command() or
> > croak "No df command known for format ".$self->{'FORMAT'};
> > open(HANDLE,"$cmd|") or croak("Cannot fork \"$cmd\": $!");
> > - return $self->load(\*HANDLE);
> > + my $ret = $self->load(\*HANDLE);
> > close(HANDLE) or croak("Cannot df $!");
> > + return $ret;
> > }
> >
> > sub load() {
> >
> > -----------------------------------------------------------------------
> >
> > Summary of changes:
> > lib/Filesys/DiskFree.pm | 3 +-
> > lib/MTT/DoCommand.pm | 162
> +++++++++++++++++++++++++++++++++++-------------
> > 2 files changed, 120 insertions(+), 45 deletions(-)
> >
> >
> > hooks/post-receive
> > --
> > MTT
> > _______________________________________________
> > mtt-svn mailing list
> > mtt-svn_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-svn
> >
> > _______________________________________________
> > mtt-devel mailing list
> > mtt-devel_at_[hidden]
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> > Searchable archives:
> http://www.open-mpi.org/community/lists/mtt-devel/2014/06/index.php
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> mtt-devel mailing list
> mtt-devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> Link to this post:
> http://www.open-mpi.org/community/lists/mtt-devel/2014/06/0627.php
>