Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: [MTT devel] Analysis of hung jobs.
From: Ashley Pittman (ashley_at_[hidden])
Date: 2009-10-06 05:23:48


Further to the mail linked below, padb is able to perform diagnostics,
including backtraces on hung jobs and integrates well into automated
testing environments.

The attached patch is a minimal change which should enable the
functionality. I don't however have access to a working MTT
installation to test this however.

http://www.open-mpi.org/community/lists/mtt-devel/2009/06/0415.php

This will require a HEAD version of padb, at least r273 to allow it to
accept the pid of mpirun rather than a jobid assigned by the underlying
resource manager.

Yours,

Ashley,

-- 
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk