Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Andrew Friedley (afriedle_at_[hidden])
Date: 2007-08-30 11:46:36


George Bosilca wrote:
> Until then you should be using the latest command "tv8 mpirun -a -np
> 2 -bynode `pwd`/NPmpi". The `pwd` is really important for some
> reason, otherwise TotalView is unable to find the executable. The
> problem is that the name of the process will be "./NPmpi" and
> TotalView does not have access to the path where the executable was
> launched (at least that's the reason I think).
>

Thanks George. That works except for one catch, when I'm asked on
startup if I want to stop the parallel job (and hit yes), totalview
waits forever trying to connect to a remote server. I see this on the
xterm (shortened in a few places):

Launching TotalView Debugger Servers with command:
srun --jobid=0 -N1 -n1 -w`awk -F. 'BEGIN {ORS=","} {if (NR==1) ORS="";
print $1}' $PWD/TVT1Pa4Fjm` -l --input=none
/usr/global/tools/totalview.8.1.0-1/linux-x86-64/bin/tvdsvr
-callback_host atlas34 -callback_ports atlas31:16382 -set_pws
47319a24:4688a7a2 -verbosity info -working_directory $PWD/NetPIPE_3.6.2
srun: error: Invalid numeric value "0" for jobid.

I got around this by hitting cancel in the 'waiting to connect' dialog,
then setting my slurm jobid manually in file -> preferences -> bulk
launch -> command instead of the %J filler, and restarting. Is there a
better work around for this?

Andrew