Thanks for the tips Gus. I'll definitely try some of these, particularly the nodes:ppn syntax, and report back.

Right now, I'm upgrading the Intel Compilers and rebuilding Open MPI.


On Fri, Jun 1, 2012 at 2:39 PM, Gus Correa <gus@ldeo.columbia.edu> wrote:
The [Torque/PBS] syntax '-l procs=48' is somewhat troublesome,
and may not be understood by the scheduler [It doesn't
work correctly with Maui, which is what we have here.  I read
people saying it works with pbs_sched and with Moab,
but that's hearsay.]
This issue comes back very often in the Torque mailing
list.

Have you tried instead this alternate syntax?

'-l nodes=2:ppn=24'

[I am assuming here that your
nodes have 24 cores, i.e. 24 'ppn', each]

Then in the script:
mpiexec -np 48 ./your_program


Also, in your PBS script you could print
the contents of PBS_NODEFILE.

cat $PBS_NODEFILE


A simple troubleshooting test is to launch 'hostname'
with mpirun

mpirun -np 48 hostname

Finally, are you sure that the OpenMPI you are using was
compiled with Torque support?
If not, I wonder if clauses like '-bynode' would work at all.
Jeff may correct me if I am wrong, but if your
OpenMPI lacks Torque support,
you may need to pass to mpirun
the $PBS_NODEFILE as your hostfile.



--
Edmund Sumbar
University of Alberta
+1 780 492 9360