Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
From: Chris Jewell (chris.jewell_at_[hidden])
Date: 2010-11-16 12:13:30


On 16 Nov 2010, at 14:26, Terry Dontje wrote:
>
> In the original case of 7 nodes and processes if we do -binding pe linear:2, and add the -bind-to-core to mpirun I'd actually expect 6 of the nodes processes bind to one core and the 7th node with 2 processes to have each of those processes bound to different cores on the same machine.
>
> Can we get a full output of such a run with -report-bindings turned on. I think we should find out that things actually are happening correctly except for the fact that the 6 of the nodes have 2 cores allocated but only one is being bound to by a process.

Sure. Here's the stderr of a job submitted to my cluster with 'qsub -pe mpi 8 -binding linear:2 myScript.com' where myScript.com runs 'mpirun -mca ras_gridengine_verbose 100 --report-bindings ./unterm':

[exec4:17384] System has detected external process binding to cores 0022
[exec4:17384] ras:gridengine: JOB_ID: 59352
[exec4:17384] ras:gridengine: PE_HOSTFILE: /usr/sge/default/spool/exec4/active_jobs/59352.1/pe_hostfile
[exec4:17384] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows slots=2
[exec4:17384] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows slots=1
[exec4:17384] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows slots=1
[exec4:17384] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows slots=1
[exec4:17384] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows slots=1
[exec4:17384] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows slots=1
[exec4:17384] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows slots=1

Chris

--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778