Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fwd: [GE users] Open MPI job fails when run thru SGE
From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2009-02-02 14:16:11


On 02/02/09 06:12, Reuti wrote:
> Am 02.02.2009 um 11:31 schrieb Sangamesh B:
>
>> On Mon, Feb 2, 2009 at 12:15 PM, Reuti <reuti_at_[hidden]>
>> wrote:
>>> Am 02.02.2009 um 05:44 schrieb Sangamesh B:
>>>
>>>> On Sun, Feb 1, 2009 at 10:37 PM, Reuti <reuti_at_[hidden]>
>>>> wrote:
>>>>>
>>>>> Am 01.02.2009 um 16:00 schrieb Sangamesh B:
>>>>>
>>>>>> On Sat, Jan 31, 2009 at 6:27 PM, Reuti <reuti_at_[hidden]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Am 31.01.2009 um 08:49 schrieb Sangamesh B:
>>>>>>>
>>>>>>>> On Fri, Jan 30, 2009 at 10:20 PM, Reuti
>>>>>>>> <reuti_at_[hidden]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Am 30.01.2009 um 15:02 schrieb Sangamesh B:
>>>>>>>>>
>>>>>>>>>> Dear Open MPI,
>>>>>>>>>>
>>>>>>>>>> Do you have a solution for the following problem of Open MPI
>>>>>>>>>> (1.3)
>>>>>>>>>> when run through Grid Engine.
>>>>>>>>>>
>>>>>>>>>> I changed global execd params with H_MEMORYLOCKED=infinity and
>>>>>>>>>> restarted the sgeexecd in all nodes.
>>>>>>>>>>
>>>>>>>>>> But still the problem persists:
>>>>>>>>>>
>>>>>>>>>> $cat err.77.CPMD-OMPI
>>>>>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>>>>
>>>>>>>>> I think this might already be the reason why it's not working. A
>>>>>>>>> mpihello
>>>>>>>>> program is running fine through SGE?
>>>>>>>>>
>>>>>>>> No.
>>>>>>>>
>>>>>>>> Any Open MPI parallel job thru SGE runs only if its running on a
>>>>>>>> single node (i.e. 8processes on 8 cores of a single node). If
>>>>>>>> number
>>>>>>>> of processes is more than 8, then SGE will schedule it on 2 nodes -
>>>>>>>> the job will fail with the above error.
>>>>>>>>
>>>>>>>> Now I did a loose integration of Open MPI 1.3 with SGE. The job
>>>>>>>> runs,
>>>>>>>> but all 16 processes run on a single node.
>>>>>>>
>>>>>>> What are the entries in `qconf -sconf`for:
>>>>>>>
>>>>>>> rsh_command
>>>>>>> rsh_daemon
>>>>>>>
>>>>>> $ qconf -sconf
>>>>>> global:
>>>>>> execd_spool_dir /opt/gridengine/default/spool
>>>>>> ...
>>>>>> .....
>>>>>> qrsh_command /usr/bin/ssh
>>>>>> rsh_command /usr/bin/ssh
>>>>>> rlogin_command /usr/bin/ssh
>>>>>> rsh_daemon /usr/sbin/sshd
>>>>>> qrsh_daemon /usr/sbin/sshd
>>>>>> reprioritize 0
>>>>>
>>>>> Do you must use ssh? Often in a private cluster the rsh based one
>>>>> is ok,
>>>>> or
>>>>> with SGE 6.2 the built-in mechanism of SGE. Otherwise please follow
>>>>> this:
>>>>>
>>>>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>>>>
>>>>>
>>>>>> I think its better to check once with Open MPI 1.2.8
>>>>>>
>>>>>>> What is your mpirun command in the jobscript - you are getting there
>>>>>>> the
>>>>>>> mpirun from Open MPI? According to the output below, it's not a
>>>>>>> loose
>>>>>>> integration, but you prepare alraedy a machinefile, which is
>>>>>>> superfluous
>>>>>>> for
>>>>>>> Open MPI.
>>>>>>>
>>>>>> No. I've not prepared the machinefile for Open MPI.
>>>>>> For Tight integartion job:
>>>>>>
>>>>>> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS
>>>>>> $CPMDBIN/cpmd311-ompi-mkl.x wf1.in $PP_LIBRARY >
>>>>>> wf1.out_OMPI$NSLOTS.$JOB_ID
>>>>>>
>>>>>> For loose integration job:
>>>>>>
>>>>>> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS -hostfile
>>>>>> $TMPDIR/machines $CPMDBIN/cpmd311-ompi-mkl.x wf1.in $PP_LIBRARY >
>>>>>> wf1.out_OMPI_$JOB_ID.$NSLOTS
>>>>>
>>>>> a) you compiled Open MPI with "--with-sge"?
>>>>>
>>>> Yes. But ompi_info shows only one component of sge
>>>>
>>>> $ /opt/mpi/openmpi/1.3/intel/bin/ompi_info | grep gridengine
>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component
>>>> v1.3)
>>>>
>>>>> b) when the $SGE_ROOT variable is set, Open MPI will use a Tight
>>>>> Integration
>>>>> automatically.
>>>>>
>>>> In SGE job submit script, I set SGE_ROOT= <nothing>
>>>
>>> This will set the variable to an empty string. You need to use:
>>>
>>> unset SGE_ROOT
>>>
>> Right.
>> I used 'unset SGE_ROOT' in the job submission script. Its working now.
>> Hello world jobs are working now. (single & multiple nodes)
>>
>> Thank you for the help.
>>
>> What can be the problem with tight integration?
>
> There are obviously two issues for now with the Tight Integration for SGE:
>
> - Some processes might throw an "err=2" for unknown reason and only from
> time to time, but run fine.
>
> - Processes vanish into daemon although SGE's qrsh is used automatically
> (successive `ps -e f`show that it's called with "... orted --daemonize
> ..." for a short while) - this I overlooked in my last post when I
> stated it's working, as my process allocation was fine. Only that they
> weren't bound to any sge_shepherd.
>
> Seems SGE integration is broken, and it would be indeed better to stay
> with 1.2.8 for now :-/
>
> -- Reuti

I still do not know what is going on with the errno=2 issue. However,
the use of --daemonize does seem wrong and we will fix that. I have
created a ticket to track it.

https://svn.open-mpi.org/trac/ompi/ticket/1783

Also, I would not say that SGE integration is completely broken in 1.3.
  Rather, assuming you do not run into the errno=2 issues, the main
issue is that Open MPI does not properly account for the MPI job. It
does gather up the allocation and run the job.

Rolf

-- 
=========================
rolf.vandevaart_at_[hidden]
781-442-3043
=========================