thanks

I have to use #PBS to submit any jobs in my cluster. 
I cannot use command line to hang a job on my cluster. 

this is my script: 
--------------------------------------
#!/bin/bash
#PBS -N jobname
#PBS -l walltime=00:08:00,nodes=1
#PBS -q queuename
COMMAND=/mypath/myprog
NCORES=5

cd $PBS_O_WORKDIR
NODES=`cat $PBS_NODEFILE | wc -l`
NPROC=$(( $NCORES * $NODES ))

mpirun -np $NPROC --mca btl self,sm,openib  $COMMAND

-------------------------------------------

Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ? 
And how to get ZOMBIE_PID from the script ? 

Any help is appreciated

thanks

Oct. 25 2010


Date: Mon, 25 Oct 2010 19:24:35 +0200
From: jed@59a2.org
To: users@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustudy68@hotmail.com> wrote:
I need to use #PBS parallel job script to submit a job on MPI cluster. 

Is it not possible to reproduce locally?  Most clusters have a way to submit an interactive job (which would let you start this thing and then inspect individual processes).  Ashley's Padb suggestion will certainly be better in a non-interactive environment.
 
Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ? 

Is control returning to your script after rank 0 has exited?  In that case, you can just put this on the next line.
 
How to get the ZOMBIE_PID ? 

"ps" from the command line, or getpid() from C code.

Jed

_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users