Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] torque pbs behaviour...
From: Jody Klymak (jklymak_at_[hidden])
Date: 2009-08-10 15:43:03


Hi All,

I've been trying to get torque pbs to work on my OS X 10.5.7 cluster
with openMPI (after finding that Xgrid was pretty flaky about
connections). I *think* this is an MPI problem (perhaps via operator
error!)

If I submit openMPI with:

#PBS -l nodes=2:ppn=8

mpirun MyProg

pbs locks off two of the processors, checked via "pbsnodes -a", and
the job output. But mpirun runs the whole job on the second of the
two processors.

If I run the same job w/o qsub (i.e. using ssh)
mpirun -n 16 -host xserve01,xserve02 MyProg
it runs fine on all the nodes....

My /var/spool/toque/server_priv/nodes file looks like:

xserve01.local np=8
xserve02.local np=8

Any idea what could be going wrong or how to debu this properly? There
is nothing suspicious in the server or mom logs.

Thanks for any help,

Jody

--
Jody Klymak
http://web.uvic.ca/~jklymak/