Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [MTT devel] Analysis of hung jobs.
From: Ashley Pittman (ashley_at_[hidden])
Date: 2009-10-08 10:18:07


On Thu, 2009-10-08 at 09:51 -0400, Ethan Mallove wrote:

> $ padb --verbose --debug=all --config-option rmgr=mpirun --full-report=6336
> ...
> full job report for job 6336
>
> Attaching to job 6336
> mpirun resource manager requires pdsh to be installed
> Use of uninitialized value in printf at padb line 729.
> Use of uninitialized value in printf at padb line 729.
> DEBUG (verbose): 0: There are 0 processes over 0 hosts
> Fatal problem setting up the resource manager: mpirun
>
> I assume it's referring to the below "pdsh"?
>
> http://sourceforge.net/projects/pdsh

Yes, you'll need to able to ssh freely around from the node where
padb/pdsh is running to all compute nodes as well. For debian I had to
add "export PDSH_RCMD_TYPE=ssh" to my .bashrc to tell it to use ssh
rather than rsh.

Could you update to r283 as well, the "mpirun" resource manager is new
and I discovered this morning that it didn't like digits in hostnames.
As an added benefit it won't use pdsh or ssh if all processes are local.

Ashley,

-- 
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk