Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with openmpi and infiniband
From: Biagio Lucini (B.Lucini_at_[hidden])
Date: 2008-12-27 18:23:23


Tim Mattox wrote:
> For your runs with Open MPI over InfiniBand, try using openib,sm,self
> for the BTL setting, so that shared memory communications are used
> within a node. It would give us another datapoint to help diagnose
> the problem. As for other things we would need to help diagnose the
> problem, please follow the advice on this FAQ entry, and the help page:
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot
> http://www.open-mpi.org/community/help/
>
Dear Tim,

thank you for this pointer.

1) Ofed: It's 1.2.5, from the OpenFabrics website
2) Linux version: scientific linux (RH enterprise remaster) v. 4.2,
kernel 2.6.9-55.0.12.ELsmp
3) Subnet manager: OpenSM
4)ibv_devinfo
hca_id: mthca0
    fw_ver: 1.0.800
    node_guid: 0002:c902:0022:b398
    sys_image_guid: 0002:c902:0022:b39b
    vendor_id: 0x02c9
    vendor_part_id: 25204
    hw_ver: 0xA0
    board_id: MT_03B0120002
    phys_port_cnt: 1
        port: 1
            state: PORT_ACTIVE (4)
            max_mtu: 2048 (4)
            active_mtu: 2048 (4)
            sm_lid: 9
            port_lid: 97
            port_lmc: 0x00

(no node is different from the others, as far as the problem is concerned)

5) ifconfig:
eth0 Link encap:Ethernet HWaddr 00:17:31:E3:89:4A
          inet addr:10.0.0.12 Bcast:10.0.0.255 Mask:255.255.255.0
          inet6 addr: fe80::217:31ff:fee3:894a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:23348585 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17247486 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:19410724189 (18.0 GiB) TX bytes:14981325997 (13.9 GiB)
          Interrupt:209

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:5088 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5088 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2468843 (2.3 MiB) TX bytes:2468843 (2.3 MiB)

6) ulimit -l
8388608
(this is more than the physical memory on the node)

7) output of ompi_info attached (I have tried also earlier releases)

8) description of the problem: a program seems to communicate correctly
over the TCP network, but not over the ifiniband network. The program is
structured in such a way that if the communication does not happen, a
loop become infinite. So there is no error message, just a program
entering an infinite loop.

The command line used are:

The command line I use is

mpirun -mca btl openib,sm,self <executable>

(with openib replaced by tcp in the case of communication over ethernet).

I could include the path and the value of the variable LD_LIBRARY_PATH,
but it won't tell too much, since the installation directory is
non-standard (/opt/ompi128-intel/bin for the path and
/opt/ompi128-intel/lib for the libs).

I hope to have provided all the required info, if you need more or some
of them in more detail, please let me know.

Many thanks,
Biagio Lucini


                Open MPI: 1.2.8
   Open MPI SVN revision: r19718
                Open RTE: 1.2.8
   Open RTE SVN revision: r19718
                    OPAL: 1.2.8
       OPAL SVN revision: r19718
                  Prefix: /opt/ompi128-intel
 Configured architecture: x86_64-unknown-linux-gnu
           Configured by: root
           Configured on: Tue Dec 23 12:33:51 GMT 2008
          Configure host: master.cluster
                Built by: root
                Built on: Tue Dec 23 12:38:34 GMT 2008
              Built host: master.cluster
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: icc
     C compiler absolute: /opt/intel/cce/9.1.045/bin/icc
            C++ compiler: icpc
   C++ compiler absolute: /opt/intel/cce/9.1.045/bin/icpc
      Fortran77 compiler: ifort
  Fortran77 compiler abs: /opt/intel/fce/9.1.040/bin/ifort
      Fortran90 compiler: ifort
  Fortran90 compiler abs: /opt/intel/fce/9.1.040/bin/ifort
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: no
           MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.8)
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.8)
           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.8)
           MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.8)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.8)
         MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.8)
         MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.8)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.8)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.2.8)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.8)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.8)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.8)
               MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.8)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.8)
              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA btl: openib (MCA v1.0, API v1.0.1, Component v1.2.8)
                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.8)
                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.8)
                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.8)
              MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.8)
              MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.8)
              MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.8)
                  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.8)
                  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.8)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.8)
               MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.8)
                MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.8)
                MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.8)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.8)