Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BTL add procs errors
From: George Bosilca (bosilca_at_[hidden])
Date: 2010-06-02 05:17:40


I don't have any IB nodes, but I'm interested to see how this happens. What I would like to understand here is how do we get back in the OpenIB code if the add_procs failed for the BTL ...

  george.

On Jun 2, 2010, at 05:08 , Sylvain Jeaugey wrote:

> On Tue, 1 Jun 2010, Jeff Squyres wrote:
>
>> On May 31, 2010, at 5:10 AM, Sylvain Jeaugey wrote:
>>
>>> In my case, the error happens in :
>>> mca_btl_openib_add_procs()
>>> mca_btl_openib_size_queues()
>>> adjust_cq()
>>> ibv_create_cq_compat()
>>> ibv_create_cq()
>>
>> Can you nail this down any further? If I modify adjust_cq() to always return OMPI_ERROR, I see the openib BTL fail over properly to the TCP BTL.
> It must be because create_cq actually creates cqs. Try to apply this patch which makes create_cq_compat() *not* creates the cqs and return an error instead :
> ========================================================================
> diff -r 13df81d1d862 ompi/mca/btl/openib/btl_openib.c
> --- a/ompi/mca/btl/openib/btl_openib.c Fri May 28 14:50:25 2010 +0200
> +++ b/ompi/mca/btl/openib/btl_openib.c Wed Jun 02 10:56:57 2010 +0200
> @@ -146,6 +146,7 @@
> int cqe, void *cq_context, struct ibv_comp_channel *channel,
> int comp_vector)
> {
> + return OMPI_ERROR;
> #if OMPI_IBV_CREATE_CQ_ARGS == 3
> return ibv_create_cq(context, cqe, channel);
> #else
> ========================================================================
>
> You should see MPI_Init complete nicely and your application segfault on the next MPI operation.
>
> Sylvain
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel