Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-04-01 10:54:16


I tracked it down - not Torque specific, but impacts all managed environments. Will fix

On Apr 1, 2014, at 2:23 AM, tmishima_at_[hidden] wrote:

>
> Hi Ralph,
>
> I saw another hangup with openmpi-1.8 when I used more than 4 nodes
> (having 8 cores each) under managed state by Torque. Although I'm not
> sure you can reproduce it with SLURM, at leaset with Torque it can be
> reproduced in this way:
>
> [mishima_at_manage ~]$ qsub -I -l nodes=4:ppn=8
> qsub: waiting for job 8726.manage.cluster to start
> qsub: job 8726.manage.cluster ready
>
> [mishima_at_node09 ~]$ mpirun -np 65 ~/mis/openmpi/demos/myprog
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 65 slots
> that were requested by the application:
> /home/mishima/mis/openmpi/demos/myprog
>
> Either request fewer slots for your application, or make more slots
> available
> for use.
> --------------------------------------------------------------------------
> <<< HANG HERE!! >>>
> Abort is in progress...hit ctrl-c again within 5 seconds to forcibly
> terminate
>
> I found this behavior when I happened to input wrong procs. With less than
> 4
> nodes or rsh - namely unmanaged state, it works. I'm afraid to say I have
> no
> idea how to resolve it. I hope you could fix the problem.
>
> Tetsuya
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Searchable archives: http://www.open-mpi.org/community/lists/devel/2014/04/index.php