Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] another corner case hangup in openmpi-1.7.5rc3
From: tmishima_at_[hidden]
Date: 2014-03-17 20:40:10


Hi Ralph, I found another corner case hangup in openmpi-1.7.5rc3.

Condition:
1. allocate some nodes using RM such as TORQUE.
2. request the head node only in executing the job with
   -host or -hostfile option.

Example:
1. allocate node05,node06 using TORQUE.
2. request node05 only with -host option

[mishima_at_manage ~]$ qsub -I -l nodes=node05+node06
qsub: waiting for job 8661.manage.cluster to start
qsub: job 8661.manage.cluster ready

[mishima_at_node05 ~]$ cat $PBS_NODEFILE
node05
node06
[mishima_at_node05 ~]$ mpirun -np 1 -host node05 ~/mis/openmpi/demos/myprog
<< hang here >>

And, my fix for plm_base_launch_support.c is as follows:
--- plm_base_launch_support.c 2014-03-12 05:51:45.000000000 +0900
+++ plm_base_launch_support.try.c 2014-03-18 08:38:03.000000000 +0900
@@ -1662,7 +1662,11 @@
         OPAL_OUTPUT_VERBOSE((5, orte_plm_base_framework.framework_output,
                              "%s plm:base:setup_vm only HNP left",
                              ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
+ /* cleanup */
         OBJ_DESTRUCT(&nodes);
+ /* mark that the daemons have reported so we can proceed */
+ daemons->state = ORTE_JOB_STATE_DAEMONS_REPORTED;
+ daemons->updated = false;
         return ORTE_SUCCESS;
     }

Tetsuya