Le 17 déc. 10 à 16:13, John Hearns a écrit :
On 17 December 2010 14:45, Gilbert Grosdidier
About this issue, for which I got NO feedback ;-)
Gilbert, as you have an SGI cluster, have you filed a support request to SGI?
gg= Yes, I filed one, but with no more luck yet.
Also, which firmware do you have installed?
I have Firmware version: 2.5.0
gg= I don't know, and firmware_revs does not seem to be available.
Only thing I got on a worker node was with lspci :
03:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX IB DDR, PCIe 2.0 5GT/s] (rev a0)
gg= Looking into this one, I noticed pointers towards /etc/infiniband/connectx.conf
and /sbin/connectx_port_config, but they are not available either.
Features that are enabled with FW 2.5.0 only:
- Send with invalidate and Local invalidate send queue work requests.
- Resize CQ support.
gg= I also spotted some special hooks inside openib code about
HAVE_IBV_GET_DEVICE_LIST, HAVE_IBV_CREATE_XRC_RCV_QP and HAVE_IBV_FORK_INIT.
Are any of them suspicious together with ConnectX HCAs, please ?
Thanks, Best, G.
I recently spotted
into btl_openib.c code, that this error message could come from
some missing ConnectX HCA ibv_resize_cq function. Well ...
I was unable yet to figure out why/how this could occur, but I have
a now a closely related question about ConnectX Infiniband HCA :
does anybody know which other unimplemented IB functionalities
could be lacking for this ConnectX HCA ?
This could allow me to patch appropriately by hand the OpenMPI code,
since I currently believe these functionalities are going
undetected as missing by the configure step.
Thanks, Best, G.
Le 15 déc. 10 à 08:59, Gilbert Grosdidier a écrit :
Running with OpenMPI 1.4.3 on an SGI Altix cluster with 2048 cores, I got
this error message on all cores, right at startup :
btl_openib.c:211:adjust_cq] cannot resize completion queue, error: 12
What could be the culprit please ?
Is there a workaround ?
What parameter is to be tuned ?
Thanks in advance for any help, Best, G.