Thanks for the tips Gus. I'll definitely try some of these, particularly the nodes:ppn syntax, and report back.
Right now, I'm upgrading the Intel Compilers and rebuilding Open MPI.
The [Torque/PBS] syntax '-l procs=48' is somewhat troublesome,
and may not be understood by the scheduler [It doesn't
work correctly with Maui, which is what we have here. I read
people saying it works with pbs_sched and with Moab,
but that's hearsay.]
This issue comes back very often in the Torque mailing
list.
Have you tried instead this alternate syntax?
'-l nodes=2:ppn=24'
[I am assuming here that your
nodes have 24 cores, i.e. 24 'ppn', each]
Then in the script:
mpiexec -np 48 ./your_program
Also, in your PBS script you could print
the contents of PBS_NODEFILE.
cat $PBS_NODEFILE
A simple troubleshooting test is to launch 'hostname'
with mpirun
mpirun -np 48 hostname
Finally, are you sure that the OpenMPI you are using was
compiled with Torque support?
If not, I wonder if clauses like '-bynode' would work at all.
Jeff may correct me if I am wrong, but if your
OpenMPI lacks Torque support,
you may need to pass to mpirun
the $PBS_NODEFILE as your hostfile.