Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] v1.5: sigsegv in case of extremely low settings in theSRQs
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-06-22 17:57:38


I think your fix looks right.

But I'm getting my head warped trying to understand why you'd want numbers so low (4, 2, 1) and exactly what our algorithm will re-post for numbers that low, etc. Why do you want them so low?

On Jun 18, 2010, at 11:10 AM, nadia.derbey wrote:

> Hi,
>
> Reference is the v1.5 branch
>
> If an SRQ has the following settings: S,<size>,4,2,1
>
> 1) setup_qps() sets the following:
> mca_btl_openib_component.qp_infos[qp].u.srq_qp.rd_num=4
> mca_btl_openib_component.qp_infos[qp].u.srq_qp.rd_init=rd_num/4=1
>
> 2) create_srq() sets the following:
> openib_btl->qps[qp].u.srq_qp.rd_curr_num = 1 (rd_init value)
> openib_btl->qps[qp].u.srq_qp.rd_low_local = rd_curr_num - (rd_curr_num
> >> 2) = rd_curr_num = 1
>
> 3) if mca_btl_openib_post_srr() is called with rd_posted=1:
> rd_posted > rd_low_local is false
> num_post=rd_curr_num-rd_posted=0
> the loop is not executed
> wr is never initialized (remains NULL)
> wr->next: address not mapped
> ==> SIGSEGV
>
> The attached patch solves the problem by ensuring that we'll actually
> enter the loop and leave otherwise.
> Can someone have a look please: the patch solves the problem with my
> reproducer, but I'm not sure the fix covers all the situations.
>
> Regards,
> Nadia
>
> <001_openib_low_rd_num.patch>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/