Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Pak Lui (Pak.Lui_at_[hidden])
Date: 2007-06-30 09:50:10


Glenn,

Are you running with Solaris 10 Update 3 (11/06) and with this patch
125793-01? It is required for running with udapl btl.

http://www.sun.com/products-n-solutions/hardware/docs/html/819-7478-11/body.html#93180

Glenn Carver wrote:
> Further to my email below regarding problems with uDAPL across IB, I
> found a bug report lodged with Sun (also reported with Opensolaris at:
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6545187).
> I will lodge a support call with Sun first thing Monday though it
> might not get me very far.
>
> Would ditching clustertools and compiling the latest open-mpi and
> trying the IB / OpenIB interface work for me? Another option would
> be to revert to clustertools 6 but ideally I need the better
> implementation of MPI2 that's in open-mpi.
>
> Any workarounds on the first issue appreciated and advice on the
> second question appreciated too!
>
> Thanks
> Glenn
>
>
>> Hi,
>>
>> I'm trying to set up a new small cluster. It's based on Sun's X4100
>> servers running Solaris 10_x86. I have Open MPI that comes within
>> Clustertools 7. In addition, I have an Infiniband network between
>> the nodes. I can run parallel jobs fine if processes remain on one
>> node (each node has 4 cores). However, as soon as I try to run across
>> the nodes I get these errors from the job:
>>
>> [node3][0,1,8][/ws/hpc-ct-7/builds/7.0.build-ct7-030/ompi-ct7/ompi/mca/btl/udapl/btl_udapl_component.c:827:mca_btl_udapl_component_progress]
>> WARNING : Connection event not handled : 16391
>>
>> I've had a good look through the archives but can't find a reference
>> to this error. I realise the udapl interface is a sun addition to
>> OpenMPI but I'm hoping someone else will have seen this before and
>> know what's wrong. I have checked my IB network is functioning
>> correctly (seemed the obvious thing that could be wrong).
>>
>> Any pointers on what could be wrong much appreciated.
>>
>> Glenn
>>
>> ifconfig for the IB port reports:
>>
>> $ ifconfig ibd1
>> ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3
>> inet 192.168.50.200 netmask ffffff00 broadcast 192.168.50.255
>>
>> .. and for the other configured interface:
>>
>> $ ifconfig e1000g0
>> e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
>> inet 192.168.47.190 netmask ffffff00 broadcast 192.168.47.255
>>
>> Output from ompi_info is:
>>
>> ompi_info | more
>> Open MPI: 1.2.1r14096-ct7b030r1838
>> Open MPI SVN revision: 0
>> Open RTE: 1.2.1r14096-ct7b030r1838
>> Open RTE SVN revision: 0
>> OPAL: 1.2.1r14096-ct7b030r1838
>> OPAL SVN revision: 0
>> Prefix: /opt/SUNWhpc/HPC7.0
>> Configured architecture: i386-pc-solaris2.10
>> Configured by: root
>> Configured on: Fri Mar 30 13:40:12 EDT 2007
>> Configure host: burpen-csx10-0
>> Built by: root
>> Built on: Fri Mar 30 13:57:25 EDT 2007
>> Built host: burpen-csx10-0
>> C bindings: yes
>> C++ bindings: yes
>> Fortran77 bindings: yes (all)
>> Fortran90 bindings: yes
>> Fortran90 bindings size: trivial
>> C compiler: cc
>> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>> C++ compiler: CC
>> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>> Fortran77 compiler: f77
>> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
>> Fortran90 compiler: f95
>> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>> C profiling: yes
>> C++ profiling: yes
>> Fortran77 profiling: yes
>> Fortran90 profiling: yes
>> C++ exceptions: yes
>> Thread support: no
>> Internal debug support: no
>> MPI parameter check: runtime
>> Memory profiling support: no
>> Memory debugging support: no
>> libltdl support: yes
>> Heterogeneous support: yes
>> mpirun default --prefix: yes
>> MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
>> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
>> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>> MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
>> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.1)
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
- Pak Lui
pak.lui_at_[hidden]