Hmm...no, I don't think that's the correct patch. We want that function to remain "clean" as it's job is simply to construct the list of nodes for the VM. It's the responsibility of the launcher to decide what to do with it.
Please see https://svn.open-mpi.org/trac/ompi/ticket/4408 for a fix
On Mar 17, 2014, at 5:40 PM, tmishima_at_[hidden] wrote:
> Hi Ralph, I found another corner case hangup in openmpi-1.7.5rc3.
> 1. allocate some nodes using RM such as TORQUE.
> 2. request the head node only in executing the job with
> -host or -hostfile option.
> 1. allocate node05,node06 using TORQUE.
> 2. request node05 only with -host option
> [mishima_at_manage ~]$ qsub -I -l nodes=node05+node06
> qsub: waiting for job 8661.manage.cluster to start
> qsub: job 8661.manage.cluster ready
> [mishima_at_node05 ~]$ cat $PBS_NODEFILE
> [mishima_at_node05 ~]$ mpirun -np 1 -host node05 ~/mis/openmpi/demos/myprog
> << hang here >>
> And, my fix for plm_base_launch_support.c is as follows:
> --- plm_base_launch_support.c 2014-03-12 05:51:45.000000000 +0900
> +++ plm_base_launch_support.try.c 2014-03-18 08:38:03.000000000 +0900
> @@ -1662,7 +1662,11 @@
> OPAL_OUTPUT_VERBOSE((5, orte_plm_base_framework.framework_output,
> "%s plm:base:setup_vm only HNP left",
> + /* cleanup */
> + /* mark that the daemons have reported so we can proceed */
> + daemons->state = ORTE_JOB_STATE_DAEMONS_REPORTED;
> + daemons->updated = false;
> return ORTE_SUCCESS;
> users mailing list