Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenMPI 1.3 Infiniband Hang
From: Allen Barnett (allen_at_[hidden])
Date: 2009-08-12 22:12:53


Hi:
I recently tried to build my MPI application against OpenMPI 1.3.3. It
worked fine with OMPI 1.2.9, but with OMPI 1.3.3, it hangs part way
through. It does a fair amount of comm, but eventually it stops in a
Send/Recv point-to-point exchange. If I turn off the openib btl, it runs
to completion. Also, I built 1.3.3 with memchecker (which is very nice;
thanks to everyone who worked on that!) and it runs to completion, even
with openib active.

Our cluster consists of dual dual-core opteron boxes with Mellanox
MT25204 (InfiniHost III Lx) HCAs and a Mellanox MT47396 Infiniscale-III
switch. We're running RHEL 4.8 which appears to include OFED 1.4. I've
built everything using GCC 4.3.2. Here is the output from ibv_devinfo.
"ompi_info --all" is attached.
$ ibv_devinfo
hca_id: mthca0
        fw_ver: 1.1.0
        node_guid: 0002:c902:0024:3284
        sys_image_guid: 0002:c902:0024:3287
        vendor_id: 0x02c9
        vendor_part_id: 25204
        hw_ver: 0xA0
        board_id: MT_03B0140002
        phys_port_cnt: 1
                port: 1
                        state: active (4)
                        max_mtu: 2048 (4)
                        active_mtu: 2048 (4)
                        sm_lid: 1
                        port_lid: 1
                        port_lmc: 0x00

I'd appreciate any tips for debugging this.
Thanks,
Allen

-- 
Allen Barnett
Transpire, Inc
E-Mail: allen_at_[hidden]
Skype:  allenbarnett
Ph:     518-887-2930