Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mlx4 error - looking for guidance
From: Pavel Shamis (Pasha) (pashash_at_[hidden])
Date: 2009-03-05 17:33:09


The fw version 2.3.0 is too old. I recommend you to upgrade to the
latest version (2.6.0) from
Mellanox website
http://www.mellanox.com/content/pages.php?pg=firmware_table_ConnectXIB

Thanks,
Pasha

Jeff Layton wrote:
> Oops. I ran it on the head node and not the compute node. Here is the
> output from a compute node:
>
> hca_id: mlx4_0
> fw_ver: 2.3.000
> node_guid: 0018:8b90:97fe:1b6d
> sys_image_guid: 0018:8b90:97fe:1b70
> vendor_id: 0x02c9
> vendor_part_id: 25418
> hw_ver: 0xA0
> board_id: DEL08C0000001
> phys_port_cnt: 2
> max_mr_size: 0xffffffffffffffff
> page_size_cap: 0xfffff000
> max_qp: 131008
> max_qp_wr: 16351
> device_cap_flags: 0x001c1c66
> max_sge: 32
> max_sge_rd: 0
> max_cq: 65408
> max_cqe: 4194303
> max_mr: 131056
> max_pd: 32764
> max_qp_rd_atom: 16
> max_ee_rd_atom: 0
> max_res_rd_atom: 2096128
> max_qp_init_rd_atom: 128
> max_ee_init_rd_atom: 0
> atomic_cap: ATOMIC_HCA (1)
> max_ee: 0
> max_rdd: 0
> max_mw: 0
> max_raw_ipv6_qp: 0
> max_raw_ethy_qp: 0
> max_mcast_grp: 8192
> max_mcast_qp_attach: 56
> max_total_mcast_qp_attach: 458752
> max_ah: 0
> max_fmr: 0
> max_srq: 65472
> max_srq_wr: 16383
> max_srq_sge: 31
> max_pkeys: 128
> local_ca_ack_delay: 15
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 41
> port_lid: 70
> port_lmc: 0x00
> max_msg_sz: 0x40000000
> port_cap_flags: 0x02510868
> max_vl_num: 8 (4)
> bad_pkey_cntr: 0x0
> qkey_viol_cntr: 0x0
> sm_sl: 0
> pkey_tbl_len: 128
> gid_tbl_len: 128
> subnet_timeout: 18
> init_type_reply: 0
> active_width: 4X (2)
> active_speed: 5.0 Gbps (2)
> phys_state: LINK_UP (5)
> GID[ 0]:
> fe80:0000:0000:0000:0018:8b90:97fe:1b6e
>
> port: 2
> state: PORT_DOWN (1)
> max_mtu: 2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
> max_msg_sz: 0x40000000
> port_cap_flags: 0x02510868
> max_vl_num: 8 (4)
> bad_pkey_cntr: 0x0
> qkey_viol_cntr: 0x0
> sm_sl: 0
> pkey_tbl_len: 128
> gid_tbl_len: 128
> subnet_timeout: 0
> init_type_reply: 0
> active_width: 4X (2)
> active_speed: 2.5 Gbps (1)
> phys_state: POLLING (2)
> GID[ 0]:
> fe80:0000:0000:0000:0018:8b90:97fe:1b6f
>
>
>
>>
>> Do you have the same HCA adapter type on all of your machines ?
>> In the error log I see mlx4 error message , and mlx4 is connectX driver,
>> but ibv_devinfo show some older hca.
>>
>>>> Jeff,
>>>> Can you please provide more information about you HCA type
>>>> (ibv_devinfo -v).
>>>> Do you see this error immediate during startup, or you get it
>>>> during your run ?
>>>>
>>>> Thanks,
>>>> Pasha
>>>>
>>>> Jeff Layton wrote:
>>>>> Evening everyone,
>>>>>
>>>>> I'm running a CFD code on IB and I've encountered an error I'm not
>>>>> sure about and I'm looking for some guidance on where to start
>>>>> looking. Here's the error:
>>>>>
>>>>> mlx4: local QP operation err (QPN 260092, WQE index 9a9e0000,
>>>>> vendor syndrome 6f, opcode = 5e)
>>>>> [0,1,6][btl_openib_component.c:1392:btl_openib_component_progress]
>>>>> from compute-2-0.local to: compute-2-0.local erro
>>>>> r polling HP CQ with status LOCAL QP OPERATION ERROR status number
>>>>> 2 for wr_id 37742320 opcode 0
>>>>> mpirun noticed that job rank 0 with PID 21220 on node
>>>>> compute-2-0.local exited on signal 15 (Terminated).
>>>>> 78 additional processes aborted (not shown)
>>>>>
>>>>>
>>>>> This is openmpi-1.2.9rc2 (sorry - need to upgrade to 1.3.0). The
>>>>> code works correctly for smaller cases, but when I run larger
>>>>> cases I get this error.
>>>>>
>>>>> I'm heading to bed but I'll check email tomorrow (so to sleep and
>>>>> run but it's been a long day).
>>>>>
>>>>> TIA!
>>>>>
>>>>> Jeff
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>
>>
>
>