Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Glenn Carver (Glenn.Carver_at_[hidden])
Date: 2007-06-30 05:19:57


Further to my email below regarding problems with uDAPL across IB, I
found a bug report lodged with Sun (also reported with Opensolaris at:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6545187).
I will lodge a support call with Sun first thing Monday though it
might not get me very far.

Would ditching clustertools and compiling the latest open-mpi and
trying the IB / OpenIB interface work for me? Another option would
be to revert to clustertools 6 but ideally I need the better
implementation of MPI2 that's in open-mpi.

Any workarounds on the first issue appreciated and advice on the
second question appreciated too!

   Thanks
                Glenn

>Hi,
>
>I'm trying to set up a new small cluster. It's based on Sun's X4100
>servers running Solaris 10_x86. I have Open MPI that comes within
>Clustertools 7. In addition, I have an Infiniband network between
>the nodes. I can run parallel jobs fine if processes remain on one
>node (each node has 4 cores). However, as soon as I try to run across
>the nodes I get these errors from the job:
>
>[node3][0,1,8][/ws/hpc-ct-7/builds/7.0.build-ct7-030/ompi-ct7/ompi/mca/btl/udapl/btl_udapl_component.c:827:mca_btl_udapl_component_progress]
>WARNING : Connection event not handled : 16391
>
>I've had a good look through the archives but can't find a reference
>to this error. I realise the udapl interface is a sun addition to
>OpenMPI but I'm hoping someone else will have seen this before and
>know what's wrong. I have checked my IB network is functioning
>correctly (seemed the obvious thing that could be wrong).
>
>Any pointers on what could be wrong much appreciated.
>
> Glenn
>
>ifconfig for the IB port reports:
>
>$ ifconfig ibd1
>ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3
> inet 192.168.50.200 netmask ffffff00 broadcast 192.168.50.255
>
>.. and for the other configured interface:
>
>$ ifconfig e1000g0
>e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
> inet 192.168.47.190 netmask ffffff00 broadcast 192.168.47.255
>
>Output from ompi_info is:
>
>ompi_info | more
> Open MPI: 1.2.1r14096-ct7b030r1838
> Open MPI SVN revision: 0
> Open RTE: 1.2.1r14096-ct7b030r1838
> Open RTE SVN revision: 0
> OPAL: 1.2.1r14096-ct7b030r1838
> OPAL SVN revision: 0
> Prefix: /opt/SUNWhpc/HPC7.0
> Configured architecture: i386-pc-solaris2.10
> Configured by: root
> Configured on: Fri Mar 30 13:40:12 EDT 2007
> Configure host: burpen-csx10-0
> Built by: root
> Built on: Fri Mar 30 13:57:25 EDT 2007
> Built host: burpen-csx10-0
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: trivial
> C compiler: cc
> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
> C++ compiler: CC
> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
> Fortran77 compiler: f77
> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
> Fortran90 compiler: f95
> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: yes
> Thread support: no
> Internal debug support: no
> MPI parameter check: runtime
>Memory profiling support: no
>Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: yes
> mpirun default --prefix: yes
> MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.1)
> MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
> MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
> MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1)
> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
> MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.1)
> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
> MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1)
> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1)
> MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1)
> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.1)
> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.1)
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users