Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] MTT email timeout notification feature
From: Ethan Mallove (ethan.mallove_at_[hidden])
Date: 2009-06-25 11:23:35

In r1300, I incorporated your notes, except for #2 and #3. I dug
around for a CPAN module for stack traces, but couldn't find anything.
Maybe MTT could contain a stripped down version of GDB for the sole
purpose of gathering stack traces? Or is there some other open source
"stack trace grabber" tool out there?


On Mon, Jun/22/2009 07:37:16AM, Jeff Squyres wrote:
> Actually, I think this would be fine for the trunk. Some random notes:
> 1. It might be nice to move this logic out of the docommand sub itself and
> into its own sub.
> 2. it would also be good to generalize the ps and gdb commands for systems
> where those variants are not relevant
> 3. it might even be good to generally develop the backtrace functionality
> overall -- backtraces would be really good to capture in the database...
> 4. how about having a[n optional] timeout with the sentinel file? that is,
> it'll send a mail, then wait another timeout (e.g., 1 hour) and if the
> sentinel file still exists, mtt will remove the file and keep going
> On Jun 19, 2009, at 2:47 PM, Ethan Mallove wrote:
>> Folks,
>> I came up with a feature, which does not seem quite appropriate to go
>> into the MTT trunk, but is still possibly useful for someone other
>> than me. I have posted a note about it on the MTT wiki:
>> Here's the text of the Wiki page:
>> We (Sun) were trying to track down a hang in an MPI test that we were
>> seeing in our MTT runs which was difficult to reproduce manually. The
>> problem is that MTT kills the hanging process before a developer has a
>> chance to investigate the issue. To address this, I patched an MTT
>> client (see attached patch file) to send out a notification email
>> containing an mpirun command line and GDB back trace for the hanging
>> test. A predefined sentinel file is touched, which can later be
>> removed to force MTT to move on and continue testing. Here are the INI
>> parameters to activate the timeout email notification:
>> * {{{docommand_timeout_sentinel_file}}}
>> * {{{docommand_timeout_email_recipient}}}
>> Example usage:
>> {{{
>> $ client/mtt --scratch /foo/bar --file foo.ini
>> docommand_timeout_sentinel_file=/tmp/mtt-timeout-sentinel-file-\&random_string\(10\)
>> docommand_timeout_email_recipient=fred.flintsone_at_[hidden],barney.rubble_at_[hidden]
>> }}}
>> -Ethan
>> _______________________________________________
>> mtt-devel mailing list
>> mtt-devel_at_[hidden]
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> mtt-devel mailing list
> mtt-devel_at_[hidden]