Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Trouble using rankfile with gridengine
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-04-23 12:52:52


The slots are numbered 0-3 for a four-slot allocation as shown in your PE_HOSTFILE. Your rankfile contains assignments to slot=4 and slot=5, which are outside your allocation.

On Apr 23, 2010, at 10:36 AM, Orion Poplawski wrote:

> I'm using gridengine 6.2u5 and openmpi 1.3.3. I'm submitting a parallel
> job and would like to specify a rankfile to set processor binding but am
> getting errors.
>
> The $PE_HOSTFILE generated by gridengine is:
>
> amos.cora.nwra.com 4 clouds.q_at_[hidden] UNDEFINED
> andrew.cora.nwra.com 4 clouds.q_at_[hidden] UNDEFINED
>
> The rankfile I'm using is:
>
> rank 0=amos.cora.nwra.com slot=0
> rank 1=andrew.cora.nwra.com slot=0
> rank 2=amos.cora.nwra.com slot=4
> rank 3=andrew.cora.nwra.com slot=4
> rank 4=amos.cora.nwra.com slot=1
> rank 5=andrew.cora.nwra.com slot=1
> rank 6=amos.cora.nwra.com slot=5
> rank 7=andrew.cora.nwra.com slot=5
>
> The error I'm getting is:
>
> Rankfile claimed host amos.cora.nwra.com that was not allocated or
> oversubscribed it's slots:
>
> --------------------------------------------------------------------------
> [amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
> rmaps_rank_file.c at line 108
> [amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/rmaps_base_map_job.c at line 87
> [amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/plm_base_launch_support.c at line 77
> [amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
> plm_rsh_module.c at line 990
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
> launch so we are aborting.
>
> Any ideas?
>
> Thanks!
>
> - Orion
>
> --
> Orion Poplawski
> Technical Manager 303-415-9701 x222
> NWRA/CoRA Division FAX: 303-415-9702
> 3380 Mitchell Lane orion_at_[hidden]
> Boulder, CO 80301 http://www.cora.nwra.com
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users