Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
From: Reuti (reuti_at_[hidden])
Date: 2010-11-15 11:39:41


Am 15.11.2010 um 17:06 schrieb Chris Jewell:

> Hi Ralph,
> Thanks for the tip. With the command
> $ qsub -pe mpi 8 -binding linear:1
> I get the output
> [exec6:29172] System has detected external process binding to cores 0008
> [exec6:29172] ras:gridengine: JOB_ID: 59282
> [exec6:29172] ras:gridengine: PE_HOSTFILE: /usr/sge/default/spool/exec6/active_jobs/59282.1/pe_hostfile
> [exec6:29172] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows slots=2
> [exec6:29172] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows slots=1
> [exec6:29172] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows slots=1
> [exec6:29172] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows slots=1
> [exec6:29172] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows slots=1
> [exec6:29172] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows slots=1
> [exec6:29172] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows slots=1
> Presumably that means that OMPI is detecting the external binding okay. If so, then that confirms my problem as an issue with how GE sets the processor affinity -- essentially the controlling sge_shepherd process on each physical exec node gets bound to the requested number of cores (in this case 1) resulting in any child process (ie the ompi parallel processes) being bound to the same core. What we really need is for GE to set the binding on each execution node according to the number of parallel processes that will run there. Not sure this is doable currently...

on SGE's side it could be the problem that local MPI processes on each slave node are threads and don't invoke an additional `qrsh -inherit ...`. If you have only one MPI process per node it's working fine?

-- Reuti

> Cheers,
> Chris
> --
> Dr Chris Jewell
> Department of Statistics
> University of Warwick
> Coventry
> CV4 7AL
> UK
> Tel: +44 (0)24 7615 0778
> _______________________________________________
> users mailing list
> users_at_[hidden]