Further to the mail linked below, padb is able to perform diagnostics,
including backtraces on hung jobs and integrates well into automated
testing environments.
The attached patch is a minimal change which should enable the
functionality. I don't however have access to a working MTT
installation to test this however.
http://www.open-mpi.org/community/lists/mtt-devel/2009/06/0415.php
This will require a HEAD version of padb, at least r273 to allow it to
accept the pid of mpirun rather than a jobid assigned by the underlying
resource manager.
Yours,
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
|