Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fwd: [GE users] Open MPI job fails when run thru SGE
From: Sangamesh B (forum.san_at_[hidden])
Date: 2009-02-01 23:44:11


On Sun, Feb 1, 2009 at 10:37 PM, Reuti <reuti_at_[hidden]> wrote:
> Am 01.02.2009 um 16:00 schrieb Sangamesh B:
>
>> On Sat, Jan 31, 2009 at 6:27 PM, Reuti <reuti_at_[hidden]> wrote:
>>>
>>> Am 31.01.2009 um 08:49 schrieb Sangamesh B:
>>>
>>>> On Fri, Jan 30, 2009 at 10:20 PM, Reuti <reuti_at_[hidden]>
>>>> wrote:
>>>>>
>>>>> Am 30.01.2009 um 15:02 schrieb Sangamesh B:
>>>>>
>>>>>> Dear Open MPI,
>>>>>>
>>>>>> Do you have a solution for the following problem of Open MPI (1.3)
>>>>>> when run through Grid Engine.
>>>>>>
>>>>>> I changed global execd params with H_MEMORYLOCKED=infinity and
>>>>>> restarted the sgeexecd in all nodes.
>>>>>>
>>>>>> But still the problem persists:
>>>>>>
>>>>>> $cat err.77.CPMD-OMPI
>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>
>>>>> I think this might already be the reason why it's not working. A
>>>>> mpihello
>>>>> program is running fine through SGE?
>>>>>
>>>> No.
>>>>
>>>> Any Open MPI parallel job thru SGE runs only if its running on a
>>>> single node (i.e. 8processes on 8 cores of a single node). If number
>>>> of processes is more than 8, then SGE will schedule it on 2 nodes -
>>>> the job will fail with the above error.
>>>>
>>>> Now I did a loose integration of Open MPI 1.3 with SGE. The job runs,
>>>> but all 16 processes run on a single node.
>>>
>>> What are the entries in `qconf -sconf`for:
>>>
>>> rsh_command
>>> rsh_daemon
>>>
>> $ qconf -sconf
>> global:
>> execd_spool_dir /opt/gridengine/default/spool
>> ...
>> .....
>> qrsh_command /usr/bin/ssh
>> rsh_command /usr/bin/ssh
>> rlogin_command /usr/bin/ssh
>> rsh_daemon /usr/sbin/sshd
>> qrsh_daemon /usr/sbin/sshd
>> reprioritize 0
>
> Do you must use ssh? Often in a private cluster the rsh based one is ok, or
> with SGE 6.2 the built-in mechanism of SGE. Otherwise please follow this:
>
> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>
>
>> I think its better to check once with Open MPI 1.2.8
>>
>>> What is your mpirun command in the jobscript - you are getting there the
>>> mpirun from Open MPI? According to the output below, it's not a loose
>>> integration, but you prepare alraedy a machinefile, which is superfluous
>>> for
>>> Open MPI.
>>>
>> No. I've not prepared the machinefile for Open MPI.
>> For Tight integartion job:
>>
>> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS
>> $CPMDBIN/cpmd311-ompi-mkl.x wf1.in $PP_LIBRARY >
>> wf1.out_OMPI$NSLOTS.$JOB_ID
>>
>> For loose integration job:
>>
>> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS -hostfile
>> $TMPDIR/machines $CPMDBIN/cpmd311-ompi-mkl.x wf1.in $PP_LIBRARY >
>> wf1.out_OMPI_$JOB_ID.$NSLOTS
>
> a) you compiled Open MPI with "--with-sge"?
>
Yes. But ompi_info shows only one component of sge

$ /opt/mpi/openmpi/1.3/intel/bin/ompi_info | grep gridengine
                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3)

> b) when the $SGE_ROOT variable is set, Open MPI will use a Tight Integration
> automatically.
>
In SGE job submit script, I set SGE_ROOT= <nothing>

And run a loose integration job. It failed to run with following error:
$ cat err.87.Hello-OMPI
[node-0-18.local:08252] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
in file ess_hnp_module.c at line 126
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_plm_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[node-0-18.local:08252] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[node-0-18.local:08252] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
in file orterun.c at line 454

$ cat out.87.Hello-OMPI
/opt/gridengine/default/spool/node-0-18/active_jobs/87.1/pe_hostfile
ibc18
ibc18
ibc18
ibc18
ibc18
ibc18
ibc18
ibc18
ibc17
ibc17
ibc17
ibc17
ibc17
ibc17
ibc17
ibc17

> c) The machine file you presented looks like being for MPICH(1), the syntax
> for Open MPI in the machine is different:
>
> ibc17 slots=8
> ibc12 slots=8
>
I tested a helloworld program with Open MPI with machinefile of style MPICH(1).
It works.

So in a loose integration job,
Open MPI may not be able to find $TMPDIR/machines file
Or it might be running in a Tight integration style.
> So you would have to adjust the format of the generated file and reset
> SGE_ROOT inside your jobscript, to force Open MPI to do a loose integration
> only.
>
> -- Reuti
>
>
>> I think I should check with Open MPI 1.2.8. That may work..
>>
>> Thanks,
>> Sangamesh
>>>>
>>>> $ cat out.83.Hello-OMPI
>>>> /opt/gridengine/default/spool/node-0-17/active_jobs/83.1/pe_hostfile
>>>> ibc17
>>>> ibc17
>>>> ibc17
>>>> ibc17
>>>> ibc17
>>>> ibc17
>>>> ibc17
>>>> ibc17
>>>> ibc12
>>>> ibc12
>>>> ibc12
>>>> ibc12
>>>> ibc12
>>>> ibc12
>>>> ibc12
>>>> ibc12
>>>> Greetings: 1 of 16 from the node node-0-17.local
>>>> Greetings: 10 of 16 from the node node-0-17.local
>>>> Greetings: 15 of 16 from the node node-0-17.local
>>>> Greetings: 9 of 16 from the node node-0-17.local
>>>> Greetings: 14 of 16 from the node node-0-17.local
>>>> Greetings: 8 of 16 from the node node-0-17.local
>>>> Greetings: 11 of 16 from the node node-0-17.local
>>>> Greetings: 12 of 16 from the node node-0-17.local
>>>> Greetings: 6 of 16 from the node node-0-17.local
>>>> Greetings: 0 of 16 from the node node-0-17.local
>>>> Greetings: 5 of 16 from the node node-0-17.local
>>>> Greetings: 3 of 16 from the node node-0-17.local
>>>> Greetings: 13 of 16 from the node node-0-17.local
>>>> Greetings: 4 of 16 from the node node-0-17.local
>>>> Greetings: 7 of 16 from the node node-0-17.local
>>>> Greetings: 2 of 16 from the node node-0-17.local
>>>>
>>>> But qhost -u <user name> shows that it is scheduled/running on two
>>>> nodes.
>>>>
>>>> Any body successful in running Open MPI 1.3 tightly integrated with SGE?
>>>
>>> For a Tight Integration there's a FAQ:
>>>
>>> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>>>
>>> -- Reuti
>>>
>>>>
>>>> Thanks,
>>>> Sangamesh
>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> A daemon (pid 31947) died unexpectedly with status 129 while
>>>>>> attempting
>>>>>> to launch so we are aborting.
>>>>>>
>>>>>> There may be more information reported by the environment (see above).
>>>>>>
>>>>>> This may be because the daemon was unable to find all the needed
>>>>>> shared
>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>>>>>> the
>>>>>> location of the shared libraries on the remote nodes and this will
>>>>>> automatically be forwarded to the remote nodes.
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>>>> that caused that situation.
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun was unable to cleanly terminate the daemons on the nodes shown
>>>>>> below. Additional manual cleanup may be required - please refer to
>>>>>> the "orte-clean" tool for assistance.
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> node-0-19.local - daemon did not report back when launched
>>>>>> node-0-20.local - daemon did not report back when launched
>>>>>> node-0-21.local - daemon did not report back when launched
>>>>>> node-0-22.local - daemon did not report back when launched
>>>>>>
>>>>>> The hostnames for infiniband interfaces are ibc0, ibc1, ibc2 .. ibc23.
>>>>>> May be Open MPI is not able to identify hosts as it is using node-0-..
>>>>>> . Is this causing open mpi to fail?
>>>>>>
>>>>>> Thanks,
>>>>>> Sangamesh
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 26, 2009 at 5:09 PM, mihlon <vaclam1_at_[hidden]> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>> Hello SGE users,
>>>>>>>>
>>>>>>>> The cluster is installed with Rocks-4.3, SGE 6.0 & Open MPI 1.3.
>>>>>>>> Open MPI is configured with "--with-sge".
>>>>>>>> ompi_info shows only one component:
>>>>>>>> # /opt/mpi/openmpi/1.3/intel/bin/ompi_info | grep gridengine
>>>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3)
>>>>>>>>
>>>>>>>> Is this acceptable?
>>>>>>>
>>>>>>> maybe yes
>>>>>>>
>>>>>>> see: http://www.open-mpi.org/faq/?category=building#build-rte-sge
>>>>>>>
>>>>>>> shell$ ompi_info | grep gridengine
>>>>>>> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.3)
>>>>>>> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.3)
>>>>>>>
>>>>>>> (Specific frameworks and version numbers may vary, depending on your
>>>>>>> version of Open MPI.)
>>>>>>>
>>>>>>>> The Open MPI parallel jobs run successfully through command line,
>>>>>>>> but
>>>>>>>> fail when run thru SGE(with -pe orte <slots>).
>>>>>>>>
>>>>>>>> The error is:
>>>>>>>>
>>>>>>>> $ cat err.26.Helloworld-PRL
>>>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> A daemon (pid 8462) died unexpectedly with status 129 while
>>>>>>>> attempting
>>>>>>>> to launch so we are aborting.
>>>>>>>>
>>>>>>>> There may be more information reported by the environment (see
>>>>>>>> above).
>>>>>>>>
>>>>>>>> This may be because the daemon was unable to find all the needed
>>>>>>>> shared
>>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>>>> have
>>>>>>>> the
>>>>>>>> location of the shared libraries on the remote nodes and this will
>>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>> process
>>>>>>>> that caused that situation.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>
>>>>>>>> But the same job runs well, if it runs on a single node but with an
>>>>>>>> error:
>>>>>>>>
>>>>>>>> $ cat err.23.Helloworld-PRL
>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>> This will severely limit memory registrations.
>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>> This will severely limit memory registrations.
>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>> This will severely limit memory registrations.
>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>> This will severely limit memory registrations.
>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>> This will severely limit memory registrations.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> WARNING: There was an error initializing an OpenFabrics device.
>>>>>>>>
>>>>>>>> Local host: node-0-4.local
>>>>>>>> Local device: mthca0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>> This will severely limit memory registrations.
>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>> This will severely limit memory registrations.
>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>> This will severely limit memory registrations.
>>>>>>>> [node-0-4.local:07869] 7 more processes have sent help message
>>>>>>>> help-mpi-btl-openib.txt / error in device init
>>>>>>>> [node-0-4.local:07869] Set MCA parameter "orte_base_help_aggregate"
>>>>>>>> to
>>>>>>>> 0 to see all help / error messages
>>>>>>>>
>>>>>>>> The following link explains the same problem:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=72398
>>>>>>>>
>>>>>>>> With this reference, I put 'ulimit -l unlimited' into
>>>>>>>> /etc/init.d/sgeexecd in all nodes. Restarted the services.
>>>>>>>
>>>>>>> Do not set 'ulimit -l unlimited' in /etc/init.d/sgeexecd
>>>>>>> but set it in the SGE:
>>>>>>>
>>>>>>> Run qconf -mconf and set execd_params
>>>>>>>
>>>>>>>
>>>>>>> frontend$> qconf -sconf
>>>>>>> ...
>>>>>>> execd_params H_MEMORYLOCKED=infinity
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>> Then restart all your sgeexecd hosts.
>>>>>>>
>>>>>>>
>>>>>>> Milan
>>>>>>>
>>>>>>>> But still the problem persists.
>>>>>>>>
>>>>>>>> What could be the way out for this?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sangamesh
>>>>>>>>
>>>>>>>> ------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=99133
>>>>>>>>
>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>> [users-unsubscribe_at_[hidden]].
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=99461
>>>>>>>
>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>> [users-unsubscribe_at_[hidden]].
>>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>