Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)
From: Gilbert Grosdidier (Gilbert.Grosdidier_at_[hidden])
Date: 2010-12-17 10:47:48


Bonjour John,

  First, Thanks for your feedback.

Le 17 déc. 10 à 16:13, John Hearns a écrit :

> On 17 December 2010 14:45, Gilbert Grosdidier
> <Gilbert.Grosdidier_at_[hidden]> wrote:
>> Bonjour,
>> About this issue, for which I got NO feedback ;-)
>
> Gilbert, as you have an SGI cluster, have you filed a support
> request to SGI?

gg= Yes, I filed one, but with no more luck yet.

> Also, which firmware do you have installed?
> I have Firmware version: 2.5.0

gg= I don't know, and firmware_revs does not seem to be available.
Only thing I got on a worker node was with lspci :
> 03:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX IB DDR,
> PCIe 2.0 5GT/s] (rev a0)

>
> http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-docs/mlx4_release_notes.txt

gg= Looking into this one, I noticed pointers towards /etc/infiniband/
connectx.conf
and /sbin/connectx_port_config, but they are not available either.

>
> Features that are enabled with FW 2.5.0 only:
> - Send with invalidate and Local invalidate send queue work requests.
> - Resize CQ support.

gg= I also spotted some special hooks inside openib code about
HAVE_IBV_GET_DEVICE_LIST, HAVE_IBV_CREATE_XRC_RCV_QP and
HAVE_IBV_FORK_INIT.
Are any of them suspicious together with ConnectX HCAs, please ?

  Thanks, Best, G.

>
>
>
>
> I recently spotted
>> into btl_openib.c code, that this error message could come from
>> some missing ConnectX HCA ibv_resize_cq function. Well ...
>> I was unable yet to figure out why/how this could occur, but I have
>> a now a closely related question about ConnectX Infiniband HCA :
>> does anybody know which other unimplemented IB functionalities
>> could be lacking for this ConnectX HCA ?
>> This could allow me to patch appropriately by hand the OpenMPI code,
>> since I currently believe these functionalities are going
>> undetected as missing by the configure step.
>> Thanks, Best, G.
>>
>> Le 15 déc. 10 à 08:59, Gilbert Grosdidier a écrit :
>>
>> Bonjour,
>>
>> Running with OpenMPI 1.4.3 on an SGI Altix cluster with 2048 cores,
>> I got
>> this error message on all cores, right at startup :
>>
>> btl_openib.c:211:adjust_cq] cannot resize completion queue, error: 12
>>
>> What could be the culprit please ?
>> Is there a workaround ?
>> What parameter is to be tuned ?
>>
>> Thanks in advance for any help, Best, G.
>>