Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fwd: [GE users] Open MPI job fails when run thru SGE
From: Reuti (reuti_at_[hidden])
Date: 2009-02-02 01:45:45


Am 02.02.2009 um 05:44 schrieb Sangamesh B:

> On Sun, Feb 1, 2009 at 10:37 PM, Reuti <reuti_at_[hidden]>
> wrote:
>> Am 01.02.2009 um 16:00 schrieb Sangamesh B:
>>
>>> On Sat, Jan 31, 2009 at 6:27 PM, Reuti <reuti_at_staff.uni-
>>> marburg.de> wrote:
>>>>
>>>> Am 31.01.2009 um 08:49 schrieb Sangamesh B:
>>>>
>>>>> On Fri, Jan 30, 2009 at 10:20 PM, Reuti <reuti_at_staff.uni-
>>>>> marburg.de>
>>>>> wrote:
>>>>>>
>>>>>> Am 30.01.2009 um 15:02 schrieb Sangamesh B:
>>>>>>
>>>>>>> Dear Open MPI,
>>>>>>>
>>>>>>> Do you have a solution for the following problem of Open MPI
>>>>>>> (1.3)
>>>>>>> when run through Grid Engine.
>>>>>>>
>>>>>>> I changed global execd params with H_MEMORYLOCKED=infinity and
>>>>>>> restarted the sgeexecd in all nodes.
>>>>>>>
>>>>>>> But still the problem persists:
>>>>>>>
>>>>>>> $cat err.77.CPMD-OMPI
>>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>
>>>>>> I think this might already be the reason why it's not working. A
>>>>>> mpihello
>>>>>> program is running fine through SGE?
>>>>>>
>>>>> No.
>>>>>
>>>>> Any Open MPI parallel job thru SGE runs only if its running on a
>>>>> single node (i.e. 8processes on 8 cores of a single node). If
>>>>> number
>>>>> of processes is more than 8, then SGE will schedule it on 2
>>>>> nodes -
>>>>> the job will fail with the above error.
>>>>>
>>>>> Now I did a loose integration of Open MPI 1.3 with SGE. The job
>>>>> runs,
>>>>> but all 16 processes run on a single node.
>>>>
>>>> What are the entries in `qconf -sconf`for:
>>>>
>>>> rsh_command
>>>> rsh_daemon
>>>>
>>> $ qconf -sconf
>>> global:
>>> execd_spool_dir /opt/gridengine/default/spool
>>> ...
>>> .....
>>> qrsh_command /usr/bin/ssh
>>> rsh_command /usr/bin/ssh
>>> rlogin_command /usr/bin/ssh
>>> rsh_daemon /usr/sbin/sshd
>>> qrsh_daemon /usr/sbin/sshd
>>> reprioritize 0
>>
>> Do you must use ssh? Often in a private cluster the rsh based one
>> is ok, or
>> with SGE 6.2 the built-in mechanism of SGE. Otherwise please
>> follow this:
>>
>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>
>>
>>> I think its better to check once with Open MPI 1.2.8
>>>
>>>> What is your mpirun command in the jobscript - you are getting
>>>> there the
>>>> mpirun from Open MPI? According to the output below, it's not a
>>>> loose
>>>> integration, but you prepare alraedy a machinefile, which is
>>>> superfluous
>>>> for
>>>> Open MPI.
>>>>
>>> No. I've not prepared the machinefile for Open MPI.
>>> For Tight integartion job:
>>>
>>> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS
>>> $CPMDBIN/cpmd311-ompi-mkl.x wf1.in $PP_LIBRARY >
>>> wf1.out_OMPI$NSLOTS.$JOB_ID
>>>
>>> For loose integration job:
>>>
>>> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS -hostfile
>>> $TMPDIR/machines $CPMDBIN/cpmd311-ompi-mkl.x wf1.in $PP_LIBRARY >
>>> wf1.out_OMPI_$JOB_ID.$NSLOTS
>>
>> a) you compiled Open MPI with "--with-sge"?
>>
> Yes. But ompi_info shows only one component of sge
>
> $ /opt/mpi/openmpi/1.3/intel/bin/ompi_info | grep gridengine
> MCA ras: gridengine (MCA v2.0, API v2.0, Component
> v1.3)
>
>> b) when the $SGE_ROOT variable is set, Open MPI will use a Tight
>> Integration
>> automatically.
>>
> In SGE job submit script, I set SGE_ROOT= <nothing>

This will set the variable to an empty string. You need to use:

unset SGE_ROOT

Despite the mentioned error message on the list, I can run Open MPI
1.3 with tight integration into SGE.

-- Reuti

> And run a loose integration job. It failed to run with following
> error:
> $ cat err.87.Hello-OMPI
> [node-0-18.local:08252] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
> in file ess_hnp_module.c at line 126
> ----------------------------------------------------------------------
> ----
> It looks like orte_init failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_plm_base_select failed
> --> Returned value Not found (-13) instead of ORTE_SUCCESS
> ----------------------------------------------------------------------
> ----
> [node-0-18.local:08252] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
> in file runtime/orte_init.c at line 132
> ----------------------------------------------------------------------
> ----
> It looks like orte_init failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Not found (-13) instead of ORTE_SUCCESS
> ----------------------------------------------------------------------
> ----
> [node-0-18.local:08252] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
> in file orterun.c at line 454
>
> $ cat out.87.Hello-OMPI
> /opt/gridengine/default/spool/node-0-18/active_jobs/87.1/pe_hostfile
> ibc18
> ibc18
> ibc18
> ibc18
> ibc18
> ibc18
> ibc18
> ibc18
> ibc17
> ibc17
> ibc17
> ibc17
> ibc17
> ibc17
> ibc17
> ibc17
>
>
>> c) The machine file you presented looks like being for MPICH(1),
>> the syntax
>> for Open MPI in the machine is different:
>>
>> ibc17 slots=8
>> ibc12 slots=8
>>
> I tested a helloworld program with Open MPI with machinefile of
> style MPICH(1).
> It works.
>
> So in a loose integration job,
> Open MPI may not be able to find $TMPDIR/machines file
> Or it might be running in a Tight integration style.
>> So you would have to adjust the format of the generated file and
>> reset
>> SGE_ROOT inside your jobscript, to force Open MPI to do a loose
>> integration
>> only.
>>
>> -- Reuti
>>
>>
>>> I think I should check with Open MPI 1.2.8. That may work..
>>>
>>> Thanks,
>>> Sangamesh
>>>>>
>>>>> $ cat out.83.Hello-OMPI
>>>>> /opt/gridengine/default/spool/node-0-17/active_jobs/83.1/
>>>>> pe_hostfile
>>>>> ibc17
>>>>> ibc17
>>>>> ibc17
>>>>> ibc17
>>>>> ibc17
>>>>> ibc17
>>>>> ibc17
>>>>> ibc17
>>>>> ibc12
>>>>> ibc12
>>>>> ibc12
>>>>> ibc12
>>>>> ibc12
>>>>> ibc12
>>>>> ibc12
>>>>> ibc12
>>>>> Greetings: 1 of 16 from the node node-0-17.local
>>>>> Greetings: 10 of 16 from the node node-0-17.local
>>>>> Greetings: 15 of 16 from the node node-0-17.local
>>>>> Greetings: 9 of 16 from the node node-0-17.local
>>>>> Greetings: 14 of 16 from the node node-0-17.local
>>>>> Greetings: 8 of 16 from the node node-0-17.local
>>>>> Greetings: 11 of 16 from the node node-0-17.local
>>>>> Greetings: 12 of 16 from the node node-0-17.local
>>>>> Greetings: 6 of 16 from the node node-0-17.local
>>>>> Greetings: 0 of 16 from the node node-0-17.local
>>>>> Greetings: 5 of 16 from the node node-0-17.local
>>>>> Greetings: 3 of 16 from the node node-0-17.local
>>>>> Greetings: 13 of 16 from the node node-0-17.local
>>>>> Greetings: 4 of 16 from the node node-0-17.local
>>>>> Greetings: 7 of 16 from the node node-0-17.local
>>>>> Greetings: 2 of 16 from the node node-0-17.local
>>>>>
>>>>> But qhost -u <user name> shows that it is scheduled/running on two
>>>>> nodes.
>>>>>
>>>>> Any body successful in running Open MPI 1.3 tightly integrated
>>>>> with SGE?
>>>>
>>>> For a Tight Integration there's a FAQ:
>>>>
>>>> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>>>>
>>>> -- Reuti
>>>>
>>>>>
>>>>> Thanks,
>>>>> Sangamesh
>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> A daemon (pid 31947) died unexpectedly with status 129 while
>>>>>>> attempting
>>>>>>> to launch so we are aborting.
>>>>>>>
>>>>>>> There may be more information reported by the environment
>>>>>>> (see above).
>>>>>>>
>>>>>>> This may be because the daemon was unable to find all the needed
>>>>>>> shared
>>>>>>> libraries on the remote node. You may set your
>>>>>>> LD_LIBRARY_PATH to have
>>>>>>> the
>>>>>>> location of the shared libraries on the remote nodes and this
>>>>>>> will
>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> mpirun noticed that the job aborted, but has no info as to
>>>>>>> the process
>>>>>>> that caused that situation.
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> mpirun was unable to cleanly terminate the daemons on the
>>>>>>> nodes shown
>>>>>>> below. Additional manual cleanup may be required - please
>>>>>>> refer to
>>>>>>> the "orte-clean" tool for assistance.
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> ----------
>>>>>>> node-0-19.local - daemon did not report back when launched
>>>>>>> node-0-20.local - daemon did not report back when launched
>>>>>>> node-0-21.local - daemon did not report back when launched
>>>>>>> node-0-22.local - daemon did not report back when launched
>>>>>>>
>>>>>>> The hostnames for infiniband interfaces are ibc0, ibc1,
>>>>>>> ibc2 .. ibc23.
>>>>>>> May be Open MPI is not able to identify hosts as it is using
>>>>>>> node-0-..
>>>>>>> . Is this causing open mpi to fail?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sangamesh
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 26, 2009 at 5:09 PM, mihlon <vaclam1_at_[hidden]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>> Hello SGE users,
>>>>>>>>>
>>>>>>>>> The cluster is installed with Rocks-4.3, SGE 6.0 & Open MPI
>>>>>>>>> 1.3.
>>>>>>>>> Open MPI is configured with "--with-sge".
>>>>>>>>> ompi_info shows only one component:
>>>>>>>>> # /opt/mpi/openmpi/1.3/intel/bin/ompi_info | grep gridengine
>>>>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3)
>>>>>>>>>
>>>>>>>>> Is this acceptable?
>>>>>>>>
>>>>>>>> maybe yes
>>>>>>>>
>>>>>>>> see: http://www.open-mpi.org/faq/?category=building#build-
>>>>>>>> rte-sge
>>>>>>>>
>>>>>>>> shell$ ompi_info | grep gridengine
>>>>>>>> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.3)
>>>>>>>> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.3)
>>>>>>>>
>>>>>>>> (Specific frameworks and version numbers may vary, depending
>>>>>>>> on your
>>>>>>>> version of Open MPI.)
>>>>>>>>
>>>>>>>>> The Open MPI parallel jobs run successfully through command
>>>>>>>>> line,
>>>>>>>>> but
>>>>>>>>> fail when run thru SGE(with -pe orte <slots>).
>>>>>>>>>
>>>>>>>>> The error is:
>>>>>>>>>
>>>>>>>>> $ cat err.26.Helloworld-PRL
>>>>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> ------------
>>>>>>>>> A daemon (pid 8462) died unexpectedly with status 129 while
>>>>>>>>> attempting
>>>>>>>>> to launch so we are aborting.
>>>>>>>>>
>>>>>>>>> There may be more information reported by the environment (see
>>>>>>>>> above).
>>>>>>>>>
>>>>>>>>> This may be because the daemon was unable to find all the
>>>>>>>>> needed
>>>>>>>>> shared
>>>>>>>>> libraries on the remote node. You may set your
>>>>>>>>> LD_LIBRARY_PATH to
>>>>>>>>> have
>>>>>>>>> the
>>>>>>>>> location of the shared libraries on the remote nodes and
>>>>>>>>> this will
>>>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> ------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> ------------
>>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>> process
>>>>>>>>> that caused that situation.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> ------------
>>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>>
>>>>>>>>> But the same job runs well, if it runs on a single node but
>>>>>>>>> with an
>>>>>>>>> error:
>>>>>>>>>
>>>>>>>>> $ cat err.23.Helloworld-PRL
>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> ------------
>>>>>>>>> WARNING: There was an error initializing an OpenFabrics
>>>>>>>>> device.
>>>>>>>>>
>>>>>>>>> Local host: node-0-4.local
>>>>>>>>> Local device: mthca0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> ------------
>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>> [node-0-4.local:07869] 7 more processes have sent help message
>>>>>>>>> help-mpi-btl-openib.txt / error in device init
>>>>>>>>> [node-0-4.local:07869] Set MCA parameter
>>>>>>>>> "orte_base_help_aggregate"
>>>>>>>>> to
>>>>>>>>> 0 to see all help / error messages
>>>>>>>>>
>>>>>>>>> The following link explains the same problem:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>> dsForumId=38&dsMessageId=72398
>>>>>>>>>
>>>>>>>>> With this reference, I put 'ulimit -l unlimited' into
>>>>>>>>> /etc/init.d/sgeexecd in all nodes. Restarted the services.
>>>>>>>>
>>>>>>>> Do not set 'ulimit -l unlimited' in /etc/init.d/sgeexecd
>>>>>>>> but set it in the SGE:
>>>>>>>>
>>>>>>>> Run qconf -mconf and set execd_params
>>>>>>>>
>>>>>>>>
>>>>>>>> frontend$> qconf -sconf
>>>>>>>> ...
>>>>>>>> execd_params H_MEMORYLOCKED=infinity
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>> Then restart all your sgeexecd hosts.
>>>>>>>>
>>>>>>>>
>>>>>>>> Milan
>>>>>>>>
>>>>>>>>> But still the problem persists.
>>>>>>>>>
>>>>>>>>> What could be the way out for this?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sangamesh
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>> dsForumId=38&dsMessageId=99133
>>>>>>>>>
>>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>>> [users-unsubscribe_at_[hidden]].
>>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>> dsForumId=38&dsMessageId=99461
>>>>>>>>
>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>> [users-unsubscribe_at_[hidden]].
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users