Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW
From: Jon Mason (jon_at_[hidden])
Date: 2008-03-10 15:04:46


On Mon, Mar 10, 2008 at 10:03:27AM -0500, Jeff Squyres wrote:
> On Mar 10, 2008, at 9:50 AM, Steve Wise wrote:
>
> > (just thinking out loud here): The OMPi code could be designed to
> > _not_
> > assume recv's are posted until the CPC indicates they are ready. IE
> > sort
> > of asynchronous behavior. When the recvs are ready, the CPC could
> > up-call the btl and then the credits could be updated. This sounds
> > painful though :)
>
> That's the way it works, but only for the initial credits. The CPC is
> not involved beyond that.
>
> So it's likely that you'll still have this problem after initial
> wireup for OMPI PP QP's (except as I noted below, if we only allow
> that chelsio rnic to only have one PP QP and it has to be qp 0).
>
> > On the single-QP angle, Can I just run OMPI with only specifying 1 QP?
> > Or will that require coding changes?
>
>
> No coding changes required; just change the value of
> mca_btl_openib_receive_queues.

Specifying only 1 PP QP via command line seems to be working. It now
passes a tests that failed 100% of the time with the credit issue on my
2 node cluster. Futher tests on a larger setup are still pending, but
this looks like a good workaround.

I think adding an additional field to the mca-btl-openib-hca-params.ini
file to have the 1 PP QP by default would be a good long(er) term
solution to this. This way those adapters that have this deficiency can
specify it and should work "out of the box".

Thoughts?

Thanks,
Jon

>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel