On Nov 14, 2013, at 7:48 PM, Joshua Ladd <joshual_at_[hidden]> wrote:
> The proof of the pudding is that all of the MPI layer has been adapted to the new async behavior -except- for the openib cpc's. The issue of what to do with these has been raised several times, especially once the ofacm code was committed. Unfortunately, lack of time and priorities left this code to bitrot.
> [Josh] Not completely true, UDCM is supposed to be the alternative, at least for RC. Its easy to say - well, everything works now except OpenIB. If were working under the assumption that these were community decisions wholeheartedly agreed upon and fully endorsed by all members, well then we have to also believe that we agreed as a community to the following list of tasks and nobodys done anything. The only ones explicitly committed to technical work - Mellanox. Per Jeffs words, the next dominos to fall implies at least a partial ordering. We need a functioning UDCM before we can study it and figure out how to adapt it to XRC - maybe it is functioning perfectly, who knows??! Nobody, apparently - seems like it shouldve been released into the wild in 1.7.3. Are there some ppt slides that we can look at from the RFC? If so, Ive been unable to locate them. Unfortunately, this is just one piece of whats missing and we are relying on the rest of the community that agreed to these changes to make good on their promises. My biggest issue this morning is that UDCM is not in 1.7 but the OOB change is - thats a problem. You skipped steps 1, 2, 3, and 4 and went right to 5 - thats a problem. Thats not what we as a community agreed upon.
As you may recall, I deleted those from 1.7.4 because they don't work - as you folks repeatedly noted. The problem is that this change has been in the trunk, including the deprecation of the OOB cpc's per your noting that they weren't working on the trunk, for quite some time.
So this has actually been a good thing as it is finally forcing the corrective action to be taken. Nobody is blaming Mellanox, but this has to be resolved, and it clearly wasn't going to happen until a forcing event occurred.
Hopefully, Pasha and Nathan will be able to help you guys figure out how get udcm working and validated.
> Subject: [OMPI devel] Openib + common/verbs CPC consolidation
> From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
> Date: 2013-05-14 15:29:15
> Next message: Rolf vandeVaart: "[OMPI devel] Build warnings in trunk"
> Previous message: Ralph Castain: "[OMPI devel] RFC: rewrite of ORTE OOB"
> FYI: On the teleconf today, we talked about the next dominos to fall in the quest to move the BTLs down to OPAL:
> 1. Nathan will make the openib "udcm" CPC the default in the immediate future
> --> This paves the way to ditch the problematic "oob" openib CPC
> --> This also will give udcm more widespread testing
> 2. Mellanox will add XRC support to udcm
> --> This paves the way to ditch the problematic "xoob" openib CPC
> --> Josh thought they could do this within a month, but that's a SWAG and subject to change
> 3. I will ping Chelsio about getting them to add proper iWARP support into common/ofacm
> --> This paves the way to eliminate btl/openib/cpc
> --> No idea on timeframe yet
> 4. Once #3 is done, make openib use common/ofacm
> 5. Once #2, #3, and #4 are done, delete btl/openib/cpc
> #1-3 have people assigned to them. #4 does not (#5 is pretty trivial -- an svn rm plus some Makefile.am mods).
> devel mailing list