Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-11-14 13:03:28

On Nov 14, 2013, at 9:33 AM, Barrett, Brian W <bwbarre_at_[hidden]> wrote:

> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:
>> Does XRC work with the UDCM CPC?
>> On Nov 14, 2013, at 9:35 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> I think the problems in udcm were fixed by Nathan quite some time ago,
>>> but never moved to 1.7 as everyone was told that the connect code in
>>> openib was already deprecated pending merge with the new ofacm common
>>> code. Looking over at that area, I see only oob and xoob - so if the
>>> users of the common ofacm code are finding that it works, the simple
>>> answer may just be to finally complete the switchover.
>>> Meantime, perhaps someone can CMR and review a copying of the udcm cpc
>>> to the 1.7 branch?
>>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd <joshual_at_[hidden]> wrote:
>>>> Um, no. It's supposed to work with UDCM which doesn't appear to be
>>>> enabled in 1.7.
>>>> Per Ralph's comment to me last night:
>>>> "... you cannot use the oob connection manager. It doesn't work and
>>>> was deprecated. You must use udcm, which is why things are supposed to
>>>> be set to do so by default. Please check the openib connect priorities
>>>> and correct them if necessary."
>>>> However, it's never been enabled in 1.7 - don't know what "borked"
>>>> means, and from what Devendar tells me, several UDCM commits that are
>>>> in the trunk have not been pushed over to 1.7:
>>>> So, as of this moment, OpenIB BTL is essentially dead-in-the-water in
>>>> 1.7.
> I'm going to start by admitting that I haven't been paying attention to IB
> the last couple of months, so I'm out of my league a little bit here. I
> remember discussions of UDCM replacing OOB both because the OOB CPC had
> some issues and because it would make it easier to move the BTLs to the
> OPAL layer (ie, below the OOB). But I also thought that was more future
> work than it clearly was. So can someone let me know:
> 1) What the status of UDCM is (does it work reliably, does it support
> XRC, etc.)

Seems to be working okay on the IB systems at LANL and IU. Don't know about XRC - I seem to recall the answer is "no"

> 2) What's the difference between CPCs and OFACM and what's our plans
> w.r.t 1.7 there?

Pasha created ofacm because some of the collective components now need to forge connections. So he created the common/ofacm code to meet those needs, with the intention of someday replacing the openib cpc's with the new common code. However, this was stalled by the iWarp issue, and so it fell off the table.

We now have two duplicate ways of doing the same thing, but with code in two different places. :-(

> 3) Someone mentioned that ofacm oob worked, but cpc oob didn't. Can
> someone explain why?

I'm not sure that is actually true as there is no indication that anyone is using or testing the collective components that use ofacm code.

> Again, sorry for being dense; I've been spending too much time in Portals
> land lately.
> Brian
> --
> Brian W. Barrett
> Scalable System Software Group
> Sandia National Laboratories
> _______________________________________________
> devel mailing list
> devel_at_[hidden]