On Tue, 1 Jun 2010, Jeff Squyres wrote:
> On May 31, 2010, at 5:10 AM, Sylvain Jeaugey wrote:
>
>> In my case, the error happens in :
>> mca_btl_openib_add_procs()
>> mca_btl_openib_size_queues()
>> adjust_cq()
>> ibv_create_cq_compat()
>> ibv_create_cq()
>
> Can you nail this down any further? If I modify adjust_cq() to always
> return OMPI_ERROR, I see the openib BTL fail over properly to the TCP
> BTL.
It must be because create_cq actually creates cqs. Try to apply this
patch which makes create_cq_compat() *not* creates the cqs and return an
error instead :
========================================================================
diff -r 13df81d1d862 ompi/mca/btl/openib/btl_openib.c
--- a/ompi/mca/btl/openib/btl_openib.c Fri May 28 14:50:25 2010 +0200
+++ b/ompi/mca/btl/openib/btl_openib.c Wed Jun 02 10:56:57 2010 +0200
@@ -146,6 +146,7 @@
int cqe, void *cq_context, struct ibv_comp_channel *channel,
int comp_vector)
{
+ return OMPI_ERROR;
#if OMPI_IBV_CREATE_CQ_ARGS == 3
return ibv_create_cq(context, cqe, channel);
#else
========================================================================
You should see MPI_Init complete nicely and your application segfault on
the next MPI operation.
Sylvain
|