Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-11-17 08:57:44

On 11/17/2010 07:41 AM, Chris Jewell wrote:
> On 17 Nov 2010, at 11:56, Terry Dontje wrote:
>>> You are absolutely correct, Terry, and the 1.4 release series does include the proper code. The point here, though, is that SGE binds the orted to a single core, even though other cores are also allocated. So the orted detects an external binding of one core, and binds all its children to that same core.
>> I do not think you are right here. Chris sent the following which looks like OGE (fka SGE) actually did bind the hnp to multiple cores. However that message I believe is not coming from the processes themselves and actually is only shown by the hnp. I wonder if Chris adds a "-bind-to-core" option we'll see more output from the a.out's before they exec unterm?
> As requested using
> $ qsub -pe mpi 8 -binding linear:2'
> and
> 'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core -bind-to-core ./unterm'
> [exec5:06671] System has detected external process binding to cores 0028
> [exec5:06671] ras:gridengine: JOB_ID: 59434
> [exec5:06671] ras:gridengine: PE_HOSTFILE: /usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile
> [exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows slots=2
> [exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows slots=2
> [exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows slots=1
> [exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows slots=1
> [exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows slots=1
> [exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows slots=1
> No more info. I note that the external binding is slightly different to what I had before, but our cluster is busier today :-)
I would have expected more output.

> Chris
> --
> Dr Chris Jewell
> Department of Statistics
> University of Warwick
> Coventry
> CV4 7AL
> UK
> Tel: +44 (0)24 7615 0778
> _______________________________________________
> users mailing list
> users_at_[hidden]

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>