Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Glenn Carver (Glenn.Carver_at_[hidden])
Date: 2007-06-30 17:23:28


Pak,

Thanks. After I received your email I went back and checked my patch
install logs (I'd not missed that I needed the patch). Turns out that
patch install had failed on the node and when I applied the patch by
hand and rebooted it all started working.

Thanks again for taking the time to reply at the weekend! Much appreciated.

    Glenn

>Glenn,
>
>Are you running with Solaris 10 Update 3 (11/06) and with this patch
>125793-01? It is required for running with udapl btl.
>
>http://www.sun.com/products-n-solutions/hardware/docs/html/819-7478-11/body.html#93180
>
>Glenn Carver wrote:
>> Further to my email below regarding problems with uDAPL across IB, I
>> found a bug report lodged with Sun (also reported with Opensolaris at:
>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6545187).
>> I will lodge a support call with Sun first thing Monday though it
>> might not get me very far.
>>
>> Would ditching clustertools and compiling the latest open-mpi and
>> trying the IB / OpenIB interface work for me? Another option would
>> be to revert to clustertools 6 but ideally I need the better
>> implementation of MPI2 that's in open-mpi.
>>
>> Any workarounds on the first issue appreciated and advice on the
>> second question appreciated too!
>>
>> Thanks
>> Glenn
>>
>>
>>> Hi,
>>>
>>> I'm trying to set up a new small cluster. It's based on Sun's X4100
>>> servers running Solaris 10_x86. I have Open MPI that comes within
>>> Clustertools 7. In addition, I have an Infiniband network between
>>> the nodes. I can run parallel jobs fine if processes remain on one
>>> node (each node has 4 cores). However, as soon as I try to run across
>>> the nodes I get these errors from the job:
>>>
>>>
>>>[node3][0,1,8][/ws/hpc-ct-7/builds/7.0.build-ct7-030/ompi-ct7/ompi/mca/btl/udapl/btl_udapl_component.c:827:mca_btl_udapl_component_progress]
>>> WARNING : Connection event not handled : 16391
>>>
>>> I've had a good look through the archives but can't find a reference
>>> to this error. I realise the udapl interface is a sun addition to
>>> OpenMPI but I'm hoping someone else will have seen this before and
>>> know what's wrong. I have checked my IB network is functioning
>>> correctly (seemed the obvious thing that could be wrong).
>>>
>>> Any pointers on what could be wrong much appreciated.
>>>
>>> Glenn
>>>
>>> ifconfig for the IB port reports:
>>>
>>> $ ifconfig ibd1
>>> ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3
>>> inet 192.168.50.200 netmask ffffff00 broadcast 192.168.50.255
>>>
>>> .. and for the other configured interface:
>>>
>>> $ ifconfig e1000g0
>>> e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu
>>>1500 index 2
>>> inet 192.168.47.190 netmask ffffff00 broadcast 192.168.47.255
>>>
>>> Output from ompi_info is:
>>>
>>> ompi_info | more
>>> Open MPI: 1.2.1r14096-ct7b030r1838
>>> Open MPI SVN revision: 0
>>> Open RTE: 1.2.1r14096-ct7b030r1838
>>> Open RTE SVN revision: 0
>>> OPAL: 1.2.1r14096-ct7b030r1838
>>> OPAL SVN revision: 0
>>> Prefix: /opt/SUNWhpc/HPC7.0
>>> Configured architecture: i386-pc-solaris2.10
>>> Configured by: root
>>> Configured on: Fri Mar 30 13:40:12 EDT 2007
> >> Configure host: burpen-csx10-0
>>> Built by: root
>>> Built on: Fri Mar 30 13:57:25 EDT 2007
>>> Built host: burpen-csx10-0
>>> C bindings: yes
>>> C++ bindings: yes
>>> Fortran77 bindings: yes (all)
>>> Fortran90 bindings: yes
>>> Fortran90 bindings size: trivial
>>> C compiler: cc
>>> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>>> C++ compiler: CC
>>> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>>> Fortran77 compiler: f77
>>> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
> >> Fortran90 compiler: f95
>>> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>>> C profiling: yes
>>> C++ profiling: yes
>>> Fortran77 profiling: yes
>>> Fortran90 profiling: yes
>>> C++ exceptions: yes
>>> Thread support: no
>>> Internal debug support: no
>>> MPI parameter check: runtime
>>> Memory profiling support: no
>>> Memory debugging support: no
>>> libltdl support: yes
>>> Heterogeneous support: yes
>>> mpirun default --prefix: yes
>>> MCA backtrace: printstack (MCA v1.0, API v1.0,
>>>Component v1.2.1)
>>> MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>>> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
>>> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
>>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>>> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
>>> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
>>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>>> MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA ras: gridengine (MCA v1.0, API v1.3,
>>>Component v1.2.1)
>>> MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA rmaps: round_robin (MCA v1.0, API v1.3,
>>>Component v1.2.1)
>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
>>> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
>>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA pls: gridengine (MCA v1.0, API v1.3,
>>>Component v1.2.1)
>>> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
>>> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
>>> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.1)
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>--
>
>
>- Pak Lui
>pak.lui_at_[hidden]
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users