Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] ConnectX with InfiniHost IB HCAs
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2011-08-26 09:42:32

You may try to update your OFED version. I think 1.5.3 is the latest one.

Pavel (Pasha) Shamis

Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Aug 25, 2011, at 7:46 PM, <worldeb_at_[hidden]> <worldeb_at_[hidden]> wrote:
> Hi all,
> it is more hardware or system configuration question but 
> I hope people in this list have an experience.
> I have just added new ConnectX IB card to cluster with InfiniHost cards.
> And no mpi programs work. Even ofed's tests do not work.
> For example ib_send_*, ib_write_* just segfault on the host with ConnectX card and 
> still wait on the hosts with InfiniHost card. rdma_lat/bw tests segfault too but
> with messages on the InfiniHost card hosts like this:
> server read: No such file or directory
> 5924:pp_server_exch_dest: 0/45 Couldn't read remote address
> pp_read_keys: No such file or directory
> Couldn't read remote address
> Other diagnostic tools like ibv_device, ibchecknet, ibstat, ibstatus... show no errors
> and show ConnectX card in system. All modules (mlx4_*, rdma_*) loaded. IPoIB configured.
> openibd, opensmd services started without errors.
> 08:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0)
> OFED is 1.3.1, CentOS 5.2.
> ibstat
> CA 'mlx4_0'
>        CA type: MT26428
>        Number of ports: 1
>        Firmware version: 2.7.0
>        Hardware version: a0
>        Node GUID: 0x0002c903000cad14
>        System image GUID: 0x0002c903000cad17
>        Port 1:
>                State: Active
>                Physical state: LinkUp
>                Rate: 20
>                Base lid: 60
>                LMC: 0
>                SM lid: 60
>                Capability mask: 0x0251086a
>                Port GUID: 0x0002c903000cad15
> Where is a problem?
> Thanx in advance,
> Egor.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> hxxp://