Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BTL add procs errors
From: Sylvain Jeaugey (sylvain.jeaugey_at_[hidden])
Date: 2010-06-02 05:08:16


On Tue, 1 Jun 2010, Jeff Squyres wrote:

> On May 31, 2010, at 5:10 AM, Sylvain Jeaugey wrote:
>
>> In my case, the error happens in :
>> mca_btl_openib_add_procs()
>> mca_btl_openib_size_queues()
>> adjust_cq()
>> ibv_create_cq_compat()
>> ibv_create_cq()
>
> Can you nail this down any further? If I modify adjust_cq() to always
> return OMPI_ERROR, I see the openib BTL fail over properly to the TCP
> BTL.
It must be because create_cq actually creates cqs. Try to apply this
patch which makes create_cq_compat() *not* creates the cqs and return an
error instead :
========================================================================
diff -r 13df81d1d862 ompi/mca/btl/openib/btl_openib.c
--- a/ompi/mca/btl/openib/btl_openib.c Fri May 28 14:50:25 2010 +0200
+++ b/ompi/mca/btl/openib/btl_openib.c Wed Jun 02 10:56:57 2010 +0200
@@ -146,6 +146,7 @@
          int cqe, void *cq_context, struct ibv_comp_channel *channel,
          int comp_vector)
  {
+ return OMPI_ERROR;
  #if OMPI_IBV_CREATE_CQ_ARGS == 3
      return ibv_create_cq(context, cqe, channel);
  #else
========================================================================

You should see MPI_Init complete nicely and your application segfault on
the next MPI operation.

Sylvain