On Thu, Oct/08/2009 03:18:07PM, Ashley Pittman wrote:
> On Thu, 2009-10-08 at 09:51 -0400, Ethan Mallove wrote:
>
> > $ padb --verbose --debug=all --config-option rmgr=mpirun --full-report=6336
> > ...
> > full job report for job 6336
> >
> > Attaching to job 6336
> > mpirun resource manager requires pdsh to be installed
> > Use of uninitialized value in printf at padb line 729.
> > Use of uninitialized value in printf at padb line 729.
> > DEBUG (verbose): 0: There are 0 processes over 0 hosts
> > Fatal problem setting up the resource manager: mpirun
> >
> > I assume it's referring to the below "pdsh"?
> >
> > http://sourceforge.net/projects/pdsh
>
> Yes, you'll need to able to ssh freely around from the node where
> padb/pdsh is running to all compute nodes as well. For debian I had to
> add "export PDSH_RCMD_TYPE=ssh" to my .bashrc to tell it to use ssh
> rather than rsh.
>
> Could you update to r283 as well, the "mpirun" resource manager is new
> and I discovered this morning that it didn't like digits in hostnames.
> As an added benefit it won't use pdsh or ssh if all processes are local.
It looks like it's using a bad option to pdsh?
$ padb --debug=all --verbose --config-option rmgr=mpirun --full-report=24303
...
padb version 3.n (Revision 283)
full job report for job 24303
Attaching to job 24303
Use of uninitialized value in string ne at padb line 2720.
Job has 1 process(es)
Job spans 0 host(s)
DEBUG (verbose): 0: There are 1 processes over 0 hosts
DEBUG (verbose): 0: Remote process data available on frontend
DEBUG (show_cmd): 0: pdsh -w padb --inner --outer="burl-ct-v20z-0:52314"
einner: pdsh: illegal option -- -
einner: Usage: pdsh [-options] command ...
einner: -S return largest of remote command return values
einner: -h output usage menu and quit
einner: -V output version information and quit
einner: -q list the option settings and quit
einner: -b disable ^C status feature (batch mode)
einner: -d enable extra debug information from ^C status
einner: -l user execute remote commands as user
einner: -t seconds set connect timeout (default is 10 sec)
einner: -u seconds set command timeout (no default)
einner: -f n use fanout of n nodes
einner: -w host,host,... set target node list on command line
einner: -x host,host,... set node exclusion list on command line
einner: -R name set rcmd module to name
einner: -N disable hostname: labels on output lines
einner: -L list info on all loaded modules and exit
einner: available rcmd modules: rsh,exec (default: rsh)
Unexpected EOF from Inner stdout (connecting)
Unexpected EOF from Inner stderr (connecting)
Unexpected exit from parallel command (state=connecting)
result from parallel command is 256 (state=connecting)
Bad exit code from parallel command (exit_code=1)
DEBUG (verbose): 5: Completed command
-Ethan
>
> Ashley,
>
> --
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
>
>
> _______________________________________________
> mtt-devel mailing list
> mtt-devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
|