Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] Analysis of hung jobs.
From: Ashley Pittman (ashley_at_[hidden])
Date: 2009-10-08 10:18:07


On Thu, 2009-10-08 at 09:51 -0400, Ethan Mallove wrote:

> $ padb --verbose --debug=all --config-option rmgr=mpirun --full-report=6336
> ...
> full job report for job 6336
>
> Attaching to job 6336
> mpirun resource manager requires pdsh to be installed
> Use of uninitialized value in printf at padb line 729.
> Use of uninitialized value in printf at padb line 729.
> DEBUG (verbose): 0: There are 0 processes over 0 hosts
> Fatal problem setting up the resource manager: mpirun
>
> I assume it's referring to the below "pdsh"?
>
> http://sourceforge.net/projects/pdsh

Yes, you'll need to able to ssh freely around from the node where
padb/pdsh is running to all compute nodes as well. For debian I had to
add "export PDSH_RCMD_TYPE=ssh" to my .bashrc to tell it to use ssh
rather than rsh.

Could you update to r283 as well, the "mpirun" resource manager is new
and I discovered this morning that it didn't like digits in hostnames.
As an added benefit it won't use pdsh or ssh if all processes are local.

Ashley,

-- 
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk