Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ConnectX with InfiniHost IB HCAs
From: Yevgeny Kliteynik (kliteyn_at_[hidden])
Date: 2011-08-27 16:02:47


Egor,

If updating OFED doesn't solve the problem (and I kinda have the
feeling that it does), you might want to try this mailing list
for IB interoperability questions:
linux-rdma_at_[hidden]

-- YK

On 26-Aug-11 4:42 PM, Shamis, Pavel wrote:
> You may try to update your OFED version. I think 1.5.3 is the latest one.
>
> Pavel (Pasha) Shamis
> ---
> Application Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
>
>
>
>
>
>
> On Aug 25, 2011, at 7:46 PM,<worldeb_at_[hidden]> <worldeb_at_[hidden]> wrote:
>
>>
>> Hi all,
>>
>> it is more hardware or system configuration question but
>> I hope people in this list have an experience.
>> I have just added new ConnectX IB card to cluster with InfiniHost cards.
>> And no mpi programs work. Even ofed's tests do not work.
>> For example ib_send_*, ib_write_* just segfault on the host with ConnectX card and
>> still wait on the hosts with InfiniHost card. rdma_lat/bw tests segfault too but
>> with messages on the InfiniHost card hosts like this:
>> server read: No such file or directory
>> 5924:pp_server_exch_dest: 0/45 Couldn't read remote address
>>
>> pp_read_keys: No such file or directory
>> Couldn't read remote address
>>
>> Other diagnostic tools like ibv_device, ibchecknet, ibstat, ibstatus... show no errors
>> and show ConnectX card in system. All modules (mlx4_*, rdma_*) loaded. IPoIB configured.
>> openibd, opensmd services started without errors.
>>
>> 08:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0)
>> OFED is 1.3.1, CentOS 5.2.
>>
>> ibstat
>> CA 'mlx4_0'
>> CA type: MT26428
>> Number of ports: 1
>> Firmware version: 2.7.0
>> Hardware version: a0
>> Node GUID: 0x0002c903000cad14
>> System image GUID: 0x0002c903000cad17
>> Port 1:
>> State: Active
>> Physical state: LinkUp
>> Rate: 20
>> Base lid: 60
>> LMC: 0
>> SM lid: 60
>> Capability mask: 0x0251086a
>> Port GUID: 0x0002c903000cad15
>>
>> Where is a problem?
>>
>> Thanx in advance,
>> Egor.
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> hxxp://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>