Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fwd: [GE users] Open MPI job fails when run thru SGE
From: Reuti (reuti_at_[hidden])
Date: 2009-02-01 12:07:15


Am 01.02.2009 um 16:00 schrieb Sangamesh B:

> On Sat, Jan 31, 2009 at 6:27 PM, Reuti <reuti_at_[hidden]>
> wrote:
>> Am 31.01.2009 um 08:49 schrieb Sangamesh B:
>>
>>> On Fri, Jan 30, 2009 at 10:20 PM, Reuti <reuti_at_[hidden]>
>>> wrote:
>>>>
>>>> Am 30.01.2009 um 15:02 schrieb Sangamesh B:
>>>>
>>>>> Dear Open MPI,
>>>>>
>>>>> Do you have a solution for the following problem of Open MPI (1.3)
>>>>> when run through Grid Engine.
>>>>>
>>>>> I changed global execd params with H_MEMORYLOCKED=infinity and
>>>>> restarted the sgeexecd in all nodes.
>>>>>
>>>>> But still the problem persists:
>>>>>
>>>>> $cat err.77.CPMD-OMPI
>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>
>>>> I think this might already be the reason why it's not working. A
>>>> mpihello
>>>> program is running fine through SGE?
>>>>
>>> No.
>>>
>>> Any Open MPI parallel job thru SGE runs only if its running on a
>>> single node (i.e. 8processes on 8 cores of a single node). If number
>>> of processes is more than 8, then SGE will schedule it on 2 nodes -
>>> the job will fail with the above error.
>>>
>>> Now I did a loose integration of Open MPI 1.3 with SGE. The job
>>> runs,
>>> but all 16 processes run on a single node.
>>
>> What are the entries in `qconf -sconf`for:
>>
>> rsh_command
>> rsh_daemon
>>
> $ qconf -sconf
> global:
> execd_spool_dir /opt/gridengine/default/spool
> ...
> .....
> qrsh_command /usr/bin/ssh
> rsh_command /usr/bin/ssh
> rlogin_command /usr/bin/ssh
> rsh_daemon /usr/sbin/sshd
> qrsh_daemon /usr/sbin/sshd
> reprioritize 0

Do you must use ssh? Often in a private cluster the rsh based one is
ok, or with SGE 6.2 the built-in mechanism of SGE. Otherwise please
follow this:

http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html

> I think its better to check once with Open MPI 1.2.8
>
>> What is your mpirun command in the jobscript - you are getting
>> there the
>> mpirun from Open MPI? According to the output below, it's not a loose
>> integration, but you prepare alraedy a machinefile, which is
>> superfluous for
>> Open MPI.
>>
> No. I've not prepared the machinefile for Open MPI.
> For Tight integartion job:
>
> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS
> $CPMDBIN/cpmd311-ompi-mkl.x wf1.in $PP_LIBRARY >
> wf1.out_OMPI$NSLOTS.$JOB_ID
>
> For loose integration job:
>
> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS -hostfile
> $TMPDIR/machines $CPMDBIN/cpmd311-ompi-mkl.x wf1.in $PP_LIBRARY >
> wf1.out_OMPI_$JOB_ID.$NSLOTS

a) you compiled Open MPI with "--with-sge"?

b) when the $SGE_ROOT variable is set, Open MPI will use a Tight
Integration automatically.

c) The machine file you presented looks like being for MPICH(1), the
syntax for Open MPI in the machine is different:

ibc17 slots=8
ibc12 slots=8

So you would have to adjust the format of the generated file and
reset SGE_ROOT inside your jobscript, to force Open MPI to do a loose
integration only.

-- Reuti

> I think I should check with Open MPI 1.2.8. That may work..
>
> Thanks,
> Sangamesh
>>> $ cat out.83.Hello-OMPI
>>> /opt/gridengine/default/spool/node-0-17/active_jobs/83.1/pe_hostfile
>>> ibc17
>>> ibc17
>>> ibc17
>>> ibc17
>>> ibc17
>>> ibc17
>>> ibc17
>>> ibc17
>>> ibc12
>>> ibc12
>>> ibc12
>>> ibc12
>>> ibc12
>>> ibc12
>>> ibc12
>>> ibc12
>>> Greetings: 1 of 16 from the node node-0-17.local
>>> Greetings: 10 of 16 from the node node-0-17.local
>>> Greetings: 15 of 16 from the node node-0-17.local
>>> Greetings: 9 of 16 from the node node-0-17.local
>>> Greetings: 14 of 16 from the node node-0-17.local
>>> Greetings: 8 of 16 from the node node-0-17.local
>>> Greetings: 11 of 16 from the node node-0-17.local
>>> Greetings: 12 of 16 from the node node-0-17.local
>>> Greetings: 6 of 16 from the node node-0-17.local
>>> Greetings: 0 of 16 from the node node-0-17.local
>>> Greetings: 5 of 16 from the node node-0-17.local
>>> Greetings: 3 of 16 from the node node-0-17.local
>>> Greetings: 13 of 16 from the node node-0-17.local
>>> Greetings: 4 of 16 from the node node-0-17.local
>>> Greetings: 7 of 16 from the node node-0-17.local
>>> Greetings: 2 of 16 from the node node-0-17.local
>>>
>>> But qhost -u <user name> shows that it is scheduled/running on
>>> two nodes.
>>>
>>> Any body successful in running Open MPI 1.3 tightly integrated
>>> with SGE?
>>
>> For a Tight Integration there's a FAQ:
>>
>> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>>
>> -- Reuti
>>
>>>
>>> Thanks,
>>> Sangamesh
>>>
>>>> -- Reuti
>>>>
>>>>
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --------
>>>>> A daemon (pid 31947) died unexpectedly with status 129 while
>>>>> attempting
>>>>> to launch so we are aborting.
>>>>>
>>>>> There may be more information reported by the environment (see
>>>>> above).
>>>>>
>>>>> This may be because the daemon was unable to find all the
>>>>> needed shared
>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH
>>>>> to have
>>>>> the
>>>>> location of the shared libraries on the remote nodes and this will
>>>>> automatically be forwarded to the remote nodes.
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --------
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --------
>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>> process
>>>>> that caused that situation.
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --------
>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --------
>>>>> mpirun was unable to cleanly terminate the daemons on the nodes
>>>>> shown
>>>>> below. Additional manual cleanup may be required - please refer to
>>>>> the "orte-clean" tool for assistance.
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --------
>>>>> node-0-19.local - daemon did not report back when launched
>>>>> node-0-20.local - daemon did not report back when launched
>>>>> node-0-21.local - daemon did not report back when launched
>>>>> node-0-22.local - daemon did not report back when launched
>>>>>
>>>>> The hostnames for infiniband interfaces are ibc0, ibc1, ibc2 ..
>>>>> ibc23.
>>>>> May be Open MPI is not able to identify hosts as it is using
>>>>> node-0-..
>>>>> . Is this causing open mpi to fail?
>>>>>
>>>>> Thanks,
>>>>> Sangamesh
>>>>>
>>>>>
>>>>> On Mon, Jan 26, 2009 at 5:09 PM, mihlon <vaclam1_at_[hidden]>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>> Hello SGE users,
>>>>>>>
>>>>>>> The cluster is installed with Rocks-4.3, SGE 6.0 & Open MPI 1.3.
>>>>>>> Open MPI is configured with "--with-sge".
>>>>>>> ompi_info shows only one component:
>>>>>>> # /opt/mpi/openmpi/1.3/intel/bin/ompi_info | grep gridengine
>>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3)
>>>>>>>
>>>>>>> Is this acceptable?
>>>>>>
>>>>>> maybe yes
>>>>>>
>>>>>> see: http://www.open-mpi.org/faq/?category=building#build-rte-sge
>>>>>>
>>>>>> shell$ ompi_info | grep gridengine
>>>>>> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.3)
>>>>>> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.3)
>>>>>>
>>>>>> (Specific frameworks and version numbers may vary, depending
>>>>>> on your
>>>>>> version of Open MPI.)
>>>>>>
>>>>>>> The Open MPI parallel jobs run successfully through command
>>>>>>> line, but
>>>>>>> fail when run thru SGE(with -pe orte <slots>).
>>>>>>>
>>>>>>> The error is:
>>>>>>>
>>>>>>> $ cat err.26.Helloworld-PRL
>>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> A daemon (pid 8462) died unexpectedly with status 129 while
>>>>>>> attempting
>>>>>>> to launch so we are aborting.
>>>>>>>
>>>>>>> There may be more information reported by the environment
>>>>>>> (see above).
>>>>>>>
>>>>>>> This may be because the daemon was unable to find all the needed
>>>>>>> shared
>>>>>>> libraries on the remote node. You may set your
>>>>>>> LD_LIBRARY_PATH to have
>>>>>>> the
>>>>>>> location of the shared libraries on the remote nodes and this
>>>>>>> will
>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> mpirun noticed that the job aborted, but has no info as to
>>>>>>> the process
>>>>>>> that caused that situation.
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> mpirun: clean termination accomplished
>>>>>>>
>>>>>>> But the same job runs well, if it runs on a single node but
>>>>>>> with an
>>>>>>> error:
>>>>>>>
>>>>>>> $ cat err.23.Helloworld-PRL
>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>> This will severely limit memory registrations.
>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>> This will severely limit memory registrations.
>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>> This will severely limit memory registrations.
>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>> This will severely limit memory registrations.
>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>> This will severely limit memory registrations.
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> WARNING: There was an error initializing an OpenFabrics device.
>>>>>>>
>>>>>>> Local host: node-0-4.local
>>>>>>> Local device: mthca0
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>> This will severely limit memory registrations.
>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>> This will severely limit memory registrations.
>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>> This will severely limit memory registrations.
>>>>>>> [node-0-4.local:07869] 7 more processes have sent help message
>>>>>>> help-mpi-btl-openib.txt / error in device init
>>>>>>> [node-0-4.local:07869] Set MCA parameter
>>>>>>> "orte_base_help_aggregate" to
>>>>>>> 0 to see all help / error messages
>>>>>>>
>>>>>>> The following link explains the same problem:
>>>>>>>
>>>>>>>
>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>> dsForumId=38&dsMessageId=72398
>>>>>>>
>>>>>>> With this reference, I put 'ulimit -l unlimited' into
>>>>>>> /etc/init.d/sgeexecd in all nodes. Restarted the services.
>>>>>>
>>>>>> Do not set 'ulimit -l unlimited' in /etc/init.d/sgeexecd
>>>>>> but set it in the SGE:
>>>>>>
>>>>>> Run qconf -mconf and set execd_params
>>>>>>
>>>>>>
>>>>>> frontend$> qconf -sconf
>>>>>> ...
>>>>>> execd_params H_MEMORYLOCKED=infinity
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> Then restart all your sgeexecd hosts.
>>>>>>
>>>>>>
>>>>>> Milan
>>>>>>
>>>>>>> But still the problem persists.
>>>>>>>
>>>>>>> What could be the way out for this?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sangamesh
>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>> dsForumId=38&dsMessageId=99133
>>>>>>>
>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>> [users-unsubscribe_at_[hidden]].
>>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>> dsForumId=38&dsMessageId=99461
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>> [users-unsubscribe_at_[hidden]].
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users