From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-10-31 09:41:26


Ja; $done is a little odd and probably mis-named. It's the number of
file descriptors still open from the child process. When it reaches
0, we're done.

On Oct 31, 2006, at 9:39 AM, Ethan Mallove wrote:

> I've run with these changes and they seem to work (I did
> need to change the INI param "module" to "specify_module",
> from the previous commit). Just one question (see below).
>
>
> On Sun, Oct/29/2006 08:36:04AM, jsquyres_at_[hidden] wrote:
>> Author: jsquyres
>> Date: 2006-10-29 08:35:58 EST (Sun, 29 Oct 2006)
>> New Revision: 403
>>
>> Modified:
>> trunk/CHANGES
>> trunk/lib/MTT/DoCommand.pm
>> trunk/lib/MTT/Globals.pm
>> trunk/samples/ompi-core-template.ini
>>
>> Log:
>> * Add textwrap to Global defaults
>> * Add new global: drain_timeout
>> * In DoCommand, after the timeout, we'll wait drain_timeout more
>> seconds to get any final output and then unconditionally move on.
>> * Add some Verbose statements to catch when kill() does not seem to
>> be working. Have not nailed this down yet; want to see some
>> output
>> from when it occurrs.
>>
>>
>> Modified: trunk/CHANGES
>> =====================================================================
>> =========
>> --- trunk/CHANGES (original)
>> +++ trunk/CHANGES 2006-10-29 08:35:58 EST (Sun, 29 Oct 2006)
>> @@ -1,2 +1,5 @@
>> To announce to OMPI core testers:
>>
>> +- added new fields to MTT section to ini file
>> + - textwrap
>> + - drain_timeout
>>
>> Modified: trunk/lib/MTT/DoCommand.pm
>> =====================================================================
>> =========
>> --- trunk/lib/MTT/DoCommand.pm (original)
>> +++ trunk/lib/MTT/DoCommand.pm 2006-10-29 08:35:58 EST (Sun, 29
>> Oct 2006)
>> @@ -32,6 +32,7 @@
>> if ($kid != 0) {
>> return $?;
>> }
>> + Verbose("** Kill TERM didn't work!\n");
>>
>> # Nope, that didn't work. Sleep a few seconds and try again.
>> sleep(2);
>> @@ -39,6 +40,7 @@
>> if ($kid != 0) {
>> return $?;
>> }
>> + Verbose("** Kill TERM (more waiting) didn't work!\n");
>>
>> # That didn't work either. Try SIGINT;
>> kill("INT", $pid);
>> @@ -46,6 +48,7 @@
>> if ($kid != 0) {
>> return $?;
>> }
>> + Verbose("** Kill INT didn't work!\n");
>>
>> # Nope, that didn't work. Sleep a few seconds and try again.
>> sleep(2);
>> @@ -53,6 +56,7 @@
>> if ($kid != 0) {
>> return $?;
>> }
>> + Verbose("** Kill INT (more waiting) didn't work!\n");
>>
>> # Ok, now we're mad. Be violent.
>> while (1) {
>> @@ -61,13 +65,7 @@
>> if ($kid != 0) {
>> return $?;
>> }
>> - sleep(1);
>> -
>> - kill("KILL", $pid);
>> - $kid = waitpid($pid, WNOHANG);
>> - if ($kid != 0) {
>> - return $?;
>> - }
>> + Verbose("** Kill KILL didn't work!\n");
>> sleep(1);
>> }
>> }
>> @@ -278,7 +276,7 @@
>> if (defined($end_time) && time() > $end_time) {
>> my $over = time() - $end_time;
>> if ($over > $last_over) {
>> - Debug("*** Past timeout by $over seconds\n");
>> + Verbose("*** Past timeout by $over seconds\n");
>> my $st = _kill_proc($pid);
>> if (!defined($killed_status)) {
>> $killed_status = $st;
>> @@ -286,6 +284,12 @@
>> $ret->{timed_out} = 1;
>> }
>> $last_over = $over;
>> +
>> + # See if we've over the drain_timeout
>> + if ($over > $MTT::Globals::Values->{drain_timeout}) {
>> + Verbose("*** Past drain timeout; quitting\n");
>> + $done = 0;
>> + }
>
>
> I would have thought if we're "quitting" here, then $done =
> 1.
>
> -Ethan
>
>
>> }
>> }
>> close OUTerr;
>>
>> Modified: trunk/lib/MTT/Globals.pm
>> =====================================================================
>> =========
>> --- trunk/lib/MTT/Globals.pm (original)
>> +++ trunk/lib/MTT/Globals.pm 2006-10-29 08:35:58 EST (Sun, 29 Oct
>> 2006)
>> @@ -26,6 +26,8 @@
>> hostfile => undef,
>> hostlist => undef,
>> max_np => undef,
>> + textwrap => 76,
>> + drain_timeout => 5,
>> };
>>
>> # Reset $Globals per a specific ini file
>> @@ -68,6 +70,13 @@
>> if ($val) {
>> $Values->{textwrap} = $val;
>> }
>> +
>> + # Output display preference
>> +
>> + my $val = MTT::Values::Value($ini, "MTT", "drain_timeout");
>> + if ($val) {
>> + $Values->{drain_timeout} = $val;
>> + }
>> }
>>
>>
>>
>> Modified: trunk/samples/ompi-core-template.ini
>> =====================================================================
>> =========
>> --- trunk/samples/ompi-core-template.ini (original)
>> +++ trunk/samples/ompi-core-template.ini 2006-10-29 08:35:58 EST
>> (Sun, 29 Oct 2006)
>> @@ -91,9 +91,15 @@
>> # returned by &env_max_procs(), you can fill in an integer here.
>> max_np =
>>
>> -# Output display preference
>> +# OMPI Core: Output display preference; the default width at
>> which MTT
>> +# output will wrap.
>> textwrap = 76
>>
>> +# OMPI Core: After the timeout for a command has passed, wait this
>> +# many additional seconds to drain all output, and then kill it with
>> +# extreme prejiduce.
>> +drain_timeout = 5
>> +
>>
>> #====================================================================
>> ==
>> # MPI get phase
>>
>> #====================================================================
>> ==
>> _______________________________________________
>> mtt-svn mailing list
>> mtt-svn_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-svn

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems