Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] [MTT svn] GIT: MTT branch master updated. 016088f2a0831b32ab5fd6f60f4cabe67e92e594
From: Mike Dubman (miked_at_[hidden])
Date: 2014-06-23 04:37:01


this commit does more harm then good.
we experience following:

- some child processes still running after timeout and mtt killed the job.

before this commit - it worked fine.
please revert and test more.

On Sat, Jun 21, 2014 at 3:30 PM, MPI Team <mpiteam_at_[hidden]> wrote:

> The branch, master has been updated
> via 016088f2a0831b32ab5fd6f60f4cabe67e92e594 (commit)
> via 7fb4c6a4c9d71be127ea53bd463178510577f71f (commit)
> via 381ba177d835a54c3197d846f5a4edfc314efe27 (commit)
> via cfdd29de2012eeb7592706f00dd07a52dd48cf6b (commit)
> via 940030ca20eb1eaf256e898b83866c1cb83aca5c (commit)
> from c99ed7c7b159a2cab58a251bd7c0dad8972ff901 (commit)
>
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
>
> - Log -----------------------------------------------------------------
>
> https://github.com/open-mpi/mtt/commit/016088f2a0831b32ab5fd6f60f4cabe67e92e594
>
> commit 016088f2a0831b32ab5fd6f60f4cabe67e92e594
> Author: Jeff Squyres <jsquyres_at_[hidden]>
> Date: Sat Jun 21 04:58:45 2014 -0700
>
> DoCommand: several fixes to kill_proc logic
>
> 1. Fix the kill(0, $pid) test to see if the process was still alive.
>
> 2. Rename _kill_proc() to _kill_proc_tree() to indicate that it's
> really killing not only the PID in question, but also all of its
> descendants.
>
> 3. In _kill_proc_tree(), change the order to kill the main PID first,
> and ''then'' kill all the descendants.
>
> The main use case is when killing mpirun: if we kill mpirun's
> descendants ''first'', mpirun will detect its childrens' deaths and
> then cleanup and exit. Later, when MTT finally gets around to killing
> mpirun, MTT will detect that mpirun is already dead and therefore emit
> a confusing "mpirun died right at end of timeout" message. This is
> misleading at best; it doesn't indicate what actually happened.
>
> However, if we kill mpirun first, it will take care of killing all of
> its descendants. MTT will therefore emit the right messages about
> killing mpirun. MTT will then redundantly try to kill a bunch of
> now-nonexistent descendant processes of mpirun, but that's ok/safe.
> We actually ''want'' this try-to-kill-mpirun's-descendants behavior to
> handle the case when mpirun is misbehaving / not cleaning up its
> descendants.
>
> 4. DoCommand() is used for more than launching mpirun, so pass down
> $argv0 so that we can print the actual command name that is being
> killed in various Verbose/Debug messages, not the hard-coded "mpirun"
> string (which, in practice, was probably almost always correct, but
> still...).
> ---
> lib/MTT/DoCommand.pm | 78
> ++++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 55 insertions(+), 23 deletions(-)
>
> diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> index 02cdb94..646ca31 100644
> --- a/lib/MTT/DoCommand.pm
> +++ b/lib/MTT/DoCommand.pm
> @@ -2,7 +2,7 @@
> #
> # Copyright (c) 2005-2006 The Trustees of Indiana University.
> # All rights reserved.
> -# Copyright (c) 2006-2013 Cisco Systems, Inc. All rights reserved.
> +# Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved.
> # Copyright (c) 2007-2008 Sun Microsystems, Inc. All rights reserved.
> # Copyright (c) 2007-2012 High Performance Computing Center Stuttgart,
> # University of Stuttgart. All rights reserved.
> @@ -57,23 +57,27 @@ sub DoCommand {
> ($time_arg, $no_execute) = @_;
> }
>
> +# This function is called for killing both mpirun and each of its
> +# descendants. We really only want to see verbose output for when we
> +# kill mpirun itself, so only show output when the caller provides us
> +# with a $argv0 value.
> sub _kill_proc_one {
> - my ($pid) = @_;
> + my ($pid, $argv0) = @_;
>
> # How long to wait after each kill()
> my $wait_time = 5;
>
> # See if the proc is alive first
> - my $kid;
> - kill(0, $pid);
> - $kid = waitpid($pid, WNOHANG);
> - return "mpirun died right at end of timeout (MTT did not have to kill
> it)"
> - if (-1 == $kid);
> + my $num_alive = kill(0, $pid);
> + return "$argv0 died right at end of timeout (MTT did not have to kill
> it)"
> + if (0 == $num_alive);
>
> # Try an easy kill
> kill("TERM", $pid);
> - Verbose("*** Killing mpirun with SIGTERM\n");
> + Verbose("*** Killing $argv0 with SIGTERM\n")
> + if (defined($argv0));
> # Give mpirun some time to cleanup before we try to reap it.
> + my $kid;
> my $i = $wait_time;
> while ($i > 0) {
> sleep(1);
> @@ -85,45 +89,74 @@ sub _kill_proc_one {
> # process no longer exists (i.e., we get -1 back from
> # waitpid), or we successfully killed it (i.e., we got the PID
> # back from waitpid).
> - return "MTT killed mpirun via SIGTERM" if (0 != $kid);
> + return "MTT killed $argv0 via SIGTERM" if (0 != $kid);
>
> --$i;
> }
> - Verbose("** Kill TERM (after $wait_time seconds) didn't work!\n");
> + Verbose("** Kill TERM (after $wait_time seconds) didn't work!\n")
> + if (defined($argv0));
>
> # That didn't work either. Try SIGINT;
> - Verbose("*** Killing mpirun with SIGINT\n");
> + Verbose("*** Killing $argv0 with SIGINT\n")
> + if (defined($argv0));
> kill("INT", $pid);
> my $i = $wait_time;
> while ($i > 0) {
> sleep(1);
> $kid = waitpid($pid, WNOHANG);
> - return "MTT killed mpirun via SIGINT" if (0 != $kid);
> + return "MTT killed $argv0 via SIGINT" if (0 != $kid);
> --$i;
> }
> - Verbose("** Kill INT (after $wait_time seconds) didn't work!\n");
> + Verbose("** Kill INT (after $wait_time seconds) didn't work!\n")
> + if (defined($argv0));
>
> # Ok, now we're mad. Be violent.
> - Verbose("*** Now I'm mad. Killing mpirun with SIGKILL\n");
> + Verbose("*** Now I'm mad. Killing $argv0 with SIGKILL\n")
> + if (defined($argv0));
> my $count = 0;
> while (1) {
> kill("KILL", $pid);
> ++$count;
> $kid = waitpid($pid, WNOHANG);
> - return "MTT killed mpirun via $count SIGKILLs" if (0 != $kid);
> - Verbose("** Kill KILL didn't work! Sleeping and trying
> again...\n");
> + return "MTT killed $argv0 via $count SIGKILLs" if (0 != $kid);
> + Verbose("** Kill KILL didn't work! Sleeping and trying
> again...\n")
> + if (defined($argv0));
> sleep(1);
> }
> }
>
>
> -sub _kill_proc {
> - my ($pid) = @_;
> - # kill the group, take the names later
> - foreach my $process (descendant_processes($pid)) {
> +sub _kill_proc_tree {
> + my ($pid, $argv0) = @_;
> +
> + # Find all descendent processes of the main PID
> + my @children = descendant_processes($pid);
> +
> + # Kill the main PID. Note that _kill_proc_one() will give the
> + # process time to react/cleanup, so there's no need for an
> + # additional delay after it returns.
> + my $ret = _kill_proc_one($pid, $argv0);
> +
> + # Now kill all the processes that descended from the base PID.
> + #
> + # Note: when mpirun is working properly (which is one of the
> + # biggest use cases for DoCommand), this is redundant -- all the
> + # children will already be dead (because killing mpirun will
> + # ensure that all descendant processes are also killed).
> + #
> + # That being said, a) DoCommand() is used to launch more than just
> + # mpirun, b) mpirun breaks sometimes and doesn't clean up its
> + # descendants, and c) it's safe to call _kill_proc_one() on a PID
> + # that is already dead.
> + foreach my $process (@children) {
> + Debug("=== killing child proc: $process->{pid},
> $process->{argv0}\n");
> +
> + # Ignore the return. For example, we don't care if the child
> + # is already dead
> _kill_proc_one($process->{pid});
> }
> - return _kill_proc_one($pid);
> +
> + return $ret;
> }
>
>
> #--------------------------------------------------------------------------
> @@ -338,7 +371,6 @@ sub Cmd {
>
> my $tokens = _quote_escape($cmd);
>
> -
> my $pid;
> if (! $no_execute) {
>
> @@ -568,7 +600,7 @@ sub Cmd {
>
> $done = 0;
> }
> - $timeout_message = _kill_proc($pid);
> + $timeout_message = _kill_proc_tree($pid, ${$tokens}[0]);
>
> # We don't care about the exit status if we timed out
> # -- fill it with a bogus value.
>
>
> https://github.com/open-mpi/mtt/commit/7fb4c6a4c9d71be127ea53bd463178510577f71f
>
> commit 7fb4c6a4c9d71be127ea53bd463178510577f71f
> Author: Jeff Squyres <jsquyres_at_[hidden]>
> Date: Sat Jun 21 04:42:19 2014 -0700
>
> DoCommand: replace obscure perl code with easier-to-grok version
>
> The descendant_processes() and flatten() codes were written in a
> nicely efficient yet very difficult to understand manner. This commit
> replaces these two subs with admittedly less efficient but much easier
> to understand code.
>
> The new code also saves the first token of the command for each PID so
> that it can be used in Debug/Verbose statements (e.g., identify the
> command that is being killed, etc.).
> ---
> lib/MTT/DoCommand.pm | 63
> ++++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 49 insertions(+), 14 deletions(-)
>
> diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> index c3be8a8..02cdb94 100644
> --- a/lib/MTT/DoCommand.pm
> +++ b/lib/MTT/DoCommand.pm
> @@ -120,8 +120,8 @@ sub _kill_proc_one {
> sub _kill_proc {
> my ($pid) = @_;
> # kill the group, take the names later
> - foreach my $p (descendant_processes($pid)) {
> - _kill_proc_one($p);
> + foreach my $process (descendant_processes($pid)) {
> + _kill_proc_one($process->{pid});
> }
> return _kill_proc_one($pid);
> }
> @@ -527,7 +527,8 @@ sub Cmd {
> if (defined($timeout_backtrace_program) and
> !$got_backtrace) {
> $backtrace = "";
> if ( $timeout_before_backtrace_program ) {
> - foreach my $p (descendant_processes($pid)) {
> + foreach my $process (descendant_processes($pid)) {
> + my $p = $process->{pid};
> my $c = $timeout_before_backtrace_program;
> $c =~ s/%PID%/$p/g;
> Debug("_pre_backtrace cmd: $c\n");
> @@ -538,7 +539,8 @@ sub Cmd {
> $backtrace .=
> _get_backtrace($timeout_backtrace_program, $pid, $pre_pernode);
>
> if ( $timeout_after_backtrace_program ) {
> - foreach my $p (descendant_processes($pid)) {
> + foreach my $process (descendant_processes($pid)) {
> + my $p = $process->{pid};
> my $c = $timeout_after_backtrace_program;
> $c =~ s/%PID%/$p/g;
> Debug("_post_backtrace cmd: $c\n");
> @@ -738,19 +740,50 @@ sub _do_email_timeout_notification {
> close(TIMEOUT_SENTINEL_FILE);
> }
>
> -sub flatten {
> - map{ (ref($_) eq "ARRAY") ? map{flatten($_)}@$_ : $_ } @_;
> +sub find_children {
> + my ($pids, $start) = @_;
> +
> + my @ret;
> + # Scan the PID list and see if any other PIDs list $start as their
> + # PPID. If so, save that PID and then recursively look for *that*
> + # PID's children.
> + foreach my $pid (sort(keys(%{$pids}))) {
> + if ($pids->{$pid}->{ppid} == $start) {
> + push(@ret, {
> + pid => $pid,
> + argv0 => $pids->{$pid}->{argv0},
> + });
> +
> + # Add this PID's descendants to the return array
> + push(@ret, find_children($pids, $pid));
> + }
> + }
> +
> + return @ret;
> }
>
> sub descendant_processes {
> my ($base) = (@_, $$);
> - my %parentage = (`ps -eo pid,ppid` =~ /\d+/g);
> - my %reverse = map { ($_, [$_]) } %parentage;
> - while (my ($pid,$ppid) = each %parentage){
> - push @{$reverse{$ppid}}, $reverse{$pid};
> + open(PS, "ps -eo pid,ppid,cmd|") || die "Can't open ps";
> +
> + # Skip the title line
> + <PS>;
> +
> + # Read all the data lines
> + my $pids;
> + while (<PS>) {
> + # Grab the PID, PPID, and first token of the command
> + $_ =~ m/(\d+)\s+(\d+)\s+(\S+)/;
> + $pids->{$1} = {
> + pid => $1,
> + ppid => $2,
> + argv0 => $3,
> + };
> }
> - shift @{$reverse{$base}};
> - flatten($reverse{$base});
> + close(PS);
> +
> + # Find all the descendants of the $base PID
> + return find_children($pids, $base);
> }
>
> sub _get_backtrace {
> @@ -774,7 +807,8 @@ sub _get_backtrace {
>
> # Use ps -Af output to fetch the child pids,
> # and grab a stack trace from each one
> - foreach my $p (descendant_processes($pid)) {
> + foreach my $process (descendant_processes($pid)) {
> + my $p = $process->{pid};
> $gdb_cmd = "gdb - $p --command=$gdb_command_filename
> --batch";
> $ret .= "\n $gdb_cmd";
> $ret .= "\n" . `$gdb_cmd`;
> @@ -814,8 +848,9 @@ sub _get_backtrace {
> }
> }
> Debug("Stacktrace: base name $return_basename\n");
> - #foreach my $p (descendant_processes($pid))
> + #foreach my $process (descendant_processes($pid))
> #{
> + # my $p = $process->{pid};
> # my $gstack_cmd = "gstack $p";
> # $ret .= "\n $gstack_cmd";
> # $ret .= "\n" . `$gstack_cmd`;
>
>
> https://github.com/open-mpi/mtt/commit/381ba177d835a54c3197d846f5a4edfc314efe27
>
> commit 381ba177d835a54c3197d846f5a4edfc314efe27
> Author: Jeff Squyres <jsquyres_at_[hidden]>
> Date: Sat Jun 21 03:48:55 2014 -0700
>
> DoCommand: ensure to only kill the process once
>
> When the DoCommand timeout fires, MTT will kill the entire process
> tree of the launched command. However, the stdout/stderr sockets may
> still be draining, and therefore the main DoCommand loop may still
> execute more iterations. In cases like these, ensure that we don't
> try to backtrace/kill the process tree again.
>
> Also add a few more Debug statements for detailed output.
> ---
> lib/MTT/DoCommand.pm | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> index 0e253f0..c3be8a8 100644
> --- a/lib/MTT/DoCommand.pm
> +++ b/lib/MTT/DoCommand.pm
> @@ -471,6 +471,7 @@ sub Cmd {
> my $len = sysread(OUTread, $data, 99999);
> if (0 == $len) {
> vec($rin, fileno(OUTread), 1) = 0;
> + Debug("*** Child process stdout closed\n");
> --$done;
> } else {
> _append($data, $print_timestamp ? localtime() : "",
> @@ -485,6 +486,7 @@ sub Cmd {
> my $len = sysread(ERRread, $data, 99999);
> if (0 == $len) {
> vec($rin, fileno(ERRread), 1) = 0;
> + Debug("*** Child process stderr closed\n");
> --$done;
> } else {
> _append($data, $print_timestamp ? localtime() : "",
> @@ -493,11 +495,19 @@ sub Cmd {
> }
> }
>
> - # If we're running with a timeout, bail if we're past the end
> - # time
> - if (defined($end_time) && time() > $end_time) {
> + # If we're running with a timeout, check to see if a) the
> + # process is still running (i.e., stdout/stderr is still
> + # open), and b) we're past the end time.
> + if ($done > 0 && defined($end_time) && time() > $end_time) {
> my $over = time() - $end_time;
> - if ($over > $last_over) {
> +
> + # Note that we only want to backtrace/kill the process
> + # *once*. Consider: it is possible that we kill the
> + # process tree, but then stdout/stderr are still draining,
> + # and therefore we loop around here again *even though the
> + # process tree is already dead*. So put a little
> + # do-this-only-once protection in here.
> + if (!defined($timeout_message) && $over > $last_over) {
> Verbose("*** Past timeout of $timeout seconds by $over
> seconds\n");
>
> # Handle timeout file
> @@ -555,10 +565,8 @@ sub Cmd {
> );
>
> $done = 0;
> - $timeout_message = _kill_proc($pid);
> - } else {
> - $timeout_message = _kill_proc($pid);
> }
> + $timeout_message = _kill_proc($pid);
>
> # We don't care about the exit status if we timed out
> # -- fill it with a bogus value.
> @@ -576,6 +584,7 @@ sub Cmd {
> }
> }
> }
> + Debug("*** Child process now dead\n");
> close OUTerr;
> close OUTread
> if (!$merge_output);
>
>
> https://github.com/open-mpi/mtt/commit/cfdd29de2012eeb7592706f00dd07a52dd48cf6b
>
> commit cfdd29de2012eeb7592706f00dd07a52dd48cf6b
> Author: Jeff Squyres <jsquyres_at_[hidden]>
> Date: Sat Jun 21 03:36:08 2014 -0700
>
> DoCommand: remove unused variables
> ---
> lib/MTT/DoCommand.pm | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> index 0740b28..0e253f0 100644
> --- a/lib/MTT/DoCommand.pm
> +++ b/lib/MTT/DoCommand.pm
> @@ -445,7 +445,6 @@ sub Cmd {
> $end_time = time() + $timeout;
> Debug("Timeout: $timeout - $end_time (vs. now: " . time() .
> ")\n");
> }
> - my $killed_status = undef;
> my $last_over = 0;
> my $partial_out;
> my $partial_err;
> @@ -900,7 +899,6 @@ sub Win_Cmd {
> $end_time = time() + $timeout;
> Debug("Timeout: $timeout - $end_time (vs. now: " . time() .
> ")\n");
> }
> - my $killed_status = undef;
> my $last_over = 0;
>
> while(<OUTread>) {
>
>
> https://github.com/open-mpi/mtt/commit/940030ca20eb1eaf256e898b83866c1cb83aca5c
>
> commit 940030ca20eb1eaf256e898b83866c1cb83aca5c
> Author: Jeff Squyres <jsquyres_at_[hidden]>
> Date: Fri Jun 20 17:13:08 2014 -0700
>
> Diskfree.pm: fix df handle leak
>
> This left a zombied "df" process in the process table. The Perl
> runtime eventually cleaned it up, but it was weird to see a zombied df
> in ps listings.
>
> There's no point in reporting this bug upstream; according to the
> Changes file, v0.06 was published in 1998. There's been no activity
> on http://search.cpan.org/dist/Filesys-DiskFree/DiskFree.pm since
> v0.06.
> ---
> lib/Filesys/DiskFree.pm | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/Filesys/DiskFree.pm b/lib/Filesys/DiskFree.pm
> index 1fa655d..41a27d7 100644
> --- a/lib/Filesys/DiskFree.pm
> +++ b/lib/Filesys/DiskFree.pm
> @@ -124,8 +124,9 @@ sub df(){
> $cmd=$self->command() or
> croak "No df command known for format ".$self->{'FORMAT'};
> open(HANDLE,"$cmd|") or croak("Cannot fork \"$cmd\": $!");
> - return $self->load(\*HANDLE);
> + my $ret = $self->load(\*HANDLE);
> close(HANDLE) or croak("Cannot df $!");
> + return $ret;
> }
>
> sub load() {
>
> -----------------------------------------------------------------------
>
> Summary of changes:
> lib/Filesys/DiskFree.pm | 3 +-
> lib/MTT/DoCommand.pm | 162
> +++++++++++++++++++++++++++++++++++-------------
> 2 files changed, 120 insertions(+), 45 deletions(-)
>
>
> hooks/post-receive
> --
> MTT
> _______________________________________________
> mtt-svn mailing list
> mtt-svn_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-svn
>