Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Trouble using rankfile with gridengine
From: Orion Poplawski (orion_at_[hidden])
Date: 2010-04-23 12:36:03


I'm using gridengine 6.2u5 and openmpi 1.3.3. I'm submitting a parallel
job and would like to specify a rankfile to set processor binding but am
getting errors.

The $PE_HOSTFILE generated by gridengine is:

amos.cora.nwra.com 4 clouds.q_at_[hidden] UNDEFINED
andrew.cora.nwra.com 4 clouds.q_at_[hidden] UNDEFINED

The rankfile I'm using is:

rank 0=amos.cora.nwra.com slot=0
rank 1=andrew.cora.nwra.com slot=0
rank 2=amos.cora.nwra.com slot=4
rank 3=andrew.cora.nwra.com slot=4
rank 4=amos.cora.nwra.com slot=1
rank 5=andrew.cora.nwra.com slot=1
rank 6=amos.cora.nwra.com slot=5
rank 7=andrew.cora.nwra.com slot=5

The error I'm getting is:

Rankfile claimed host amos.cora.nwra.com that was not allocated or
oversubscribed it's slots:

--------------------------------------------------------------------------
[amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
rmaps_rank_file.c at line 108
[amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_map_job.c at line 87
[amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
base/plm_base_launch_support.c at line 77
[amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
plm_rsh_module.c at line 990
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.

Any ideas?

Thanks!

- Orion

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion_at_[hidden]
Boulder, CO 80301              http://www.cora.nwra.com