Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BTL add procs errors
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-06-02 08:49:38


On Jun 2, 2010, at 5:08 AM, Sylvain Jeaugey wrote:

> It must be because create_cq actually creates cqs. Try to apply this
> patch which makes create_cq_compat() *not* creates the cqs and return an
> error instead :
> ========================================================================
> diff -r 13df81d1d862 ompi/mca/btl/openib/btl_openib.c
> --- a/ompi/mca/btl/openib/btl_openib.c Fri May 28 14:50:25 2010 +0200
> +++ b/ompi/mca/btl/openib/btl_openib.c Wed Jun 02 10:56:57 2010 +0200
> @@ -146,6 +146,7 @@
> int cqe, void *cq_context, struct ibv_comp_channel *channel,
> int comp_vector)
> {
> + return OMPI_ERROR;
> #if OMPI_IBV_CREATE_CQ_ARGS == 3
> return ibv_create_cq(context, cqe, channel);
> #else
> ========================================================================

Don't you mean return NULL? This function is supposed to return a (struct ibv_cq *).

> You should see MPI_Init complete nicely and your application segfault on
> the next MPI operation.

That wouldn't surprise me if you return OMPI_ERROR here, since it's expecting a pointer return value (OMPI_ERROR != NULL, so the error check from ibv_create_cq_compat() won't detect the problem properly).

Sidenote: why did we call it ibv_create_cq_compat()? That seems like a namespace violation, and is quite confusing. :-\

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/