Bas hit the nail on the head: When using OpenMPI's mpirun under
Torque TM one apparently *must* omit the "-machinefile $PBS_NODEFILE"
flags and only specify "-np 2", presumably because TM knows all
about the machines under its control.
This behavior is new to me: Is this a feature or a bug in OpenMPI ?
At least a better behavior of mpirun could be expected when you
specify both -np and -machinefile.
Bas van der Vlies wrote:
> You must use the following command:
> mpiexec -np 2 ./a.out
> whello, i am 0 of 2
> whello, i am 1 of 2
> all is well that ends well
> $ mpiexec -np 2 -machinefile $PBS_NODEFILE ./a.out
> [ib-r6n19.irc.sara.nl:04999] pls:tm: failed to poll for a spawned proc,
> return status = 17002
> [ib-r6n19.irc.sara.nl:04999] [0,0,0] ORTE_ERROR_LOG: In errno in file
> rmgr_urm.c at line 462
> [ib-r6n19.irc.sara.nl:04999] mpiexec: spawn failed with errno=-11