Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque
From: tmishima_at_[hidden]
Date: 2014-04-01 05:23:26


Hi Ralph,

I saw another hangup with openmpi-1.8 when I used more than 4 nodes
(having 8 cores each) under managed state by Torque. Although I'm not
sure you can reproduce it with SLURM, at leaset with Torque it can be
reproduced in this way:

[mishima_at_manage ~]$ qsub -I -l nodes=4:ppn=8
qsub: waiting for job 8726.manage.cluster to start
qsub: job 8726.manage.cluster ready

[mishima_at_node09 ~]$ mpirun -np 65 ~/mis/openmpi/demos/myprog
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 65 slots
that were requested by the application:
  /home/mishima/mis/openmpi/demos/myprog

Either request fewer slots for your application, or make more slots
available
for use.
--------------------------------------------------------------------------
<<< HANG HERE!! >>>
Abort is in progress...hit ctrl-c again within 5 seconds to forcibly
terminate

I found this behavior when I happened to input wrong procs. With less than
4
nodes or rsh - namely unmanaged state, it works. I'm afraid to say I have
no
idea how to resolve it. I hope you could fix the problem.

Tetsuya