On Nov 14, 2013, at 12:42 PM, Joshua Ladd <joshual@mellanox.com> wrote:

We are happy to provide access to our set of small test clusters and engineering resources, but, honestly, Nathan/LANL guys probably have better access to a big IB system.
 
Iím sure your boss could care less, but this is not Intelís code base. Sorry to be so blunt about it, Ralph

I agree - nobody said it was. However, this community works by committee. In this case, the OOB update was discussed for more than a year, the RFC was out for nearly 6 months, the branch was made available for testing and review for nearly 3 months, and it sat in the trunk for another 3+ months before moving to the 1.7 branch.

At some point, the IB users in this community have to take responsibility for testing and helping debug their code areas, not just letting them bitrot for months and then saying "hey, something broke - somebody fix it".

As I said, I'm happy to help - but ultimately, IB support is the responsibility of the IB members of this community...and I'm not one of them.


. Weíve expended an enormous amount of effort *trying* to make OSHMEM something that works for the community and not just Mellanox customers. Believe me, we would rather focus our efforts elsewhere too.   
 
Josh
 
From: devel [mailto:devel-bounces@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, November 14, 2013 3:32 PM
To: Open MPI Developers
Cc: Yiftah Shahar; Gilad Shainer
Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect
 
 
On Nov 14, 2013, at 12:21 PM, Barrett, Brian W <bwbarre@sandia.gov> wrote:


On 11/14/13 1:13 PM, "Joshua Ladd" <joshual@mellanox.com> wrote:


Let me try to summarize my understanding of the situation:

1. Ralph made the OOB asynchronous.

2. OOB cpcs don't work as a result of 1, and are thereby "deprecated",
meaning: won't fix.

3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm
in that move.  Never changed openib to use ofacm/common.

4. UDCM is "functional" in the trunk, still sitting in openib/connect.
But no one is entirely sure if it really works which is why it was
disabled in 1.7. Nathan - is there a design doc you can share on this
beyond the comments in the code?

5. In order to satisfy the "grand plan":
                a. UDCM still needs to be moved to common/ofacm.
              b. OpenIB still needs to be changed to use common/ofacm.
              c.  RDMACM still needs to migrate to common/ofacm.
              d. XRC support needs to be added to UDCM and put into
common/ofacm.

6. The "grand plan" being:  move the BTLs into Opal - hence the need to
scuttle the OOB cpcs thereby justifying the deprecation and not fixing
cpcs after #1.

So, that's a quick roundup of how we ended up here (as I understand it.)
What needs to be done is:

That's my understanding as well.


1. Somebody needs to certify/review/ that what Nathan has done is sound.
From my perspective, this is a BIG change and needs a comprehensive
architecture review. We've been using it in the trunk, and we've been
testing it under MTT for some time - but have not deployed or tested at
large-scale out in the field. Would be nice to see something on paper in
terms of a design doc.

2. Somebody then needs to move UDCM into common/ofacm.

3. Somebody needs to change openib to use common/ofacm cpcs instead of
openib/connect cpcs.

4. Somebody needs to move RDMACM into common/ofacm and make sure RoCEE
works.

5. Somebody needs to add XRC support to UDCM - whatever that might mean.
Given Nathan added UDCM back in 2011 and nobody is really sure it's ready
for prime-time, and given Pasha's comments regarding the difference in
state machine requirements  between the two connection schemes, this
doesn't seem like a trivial task.

Given Nathan's comments a second ago about ORNL not supporting the IB
Offload component, it barely makes sense to keep common/ofacm. And it
sounds like the two cpcs presently contained therein are now unusable.

All of this work is a result of the Grand Plan to move the BTLs into the
Opal layer - which I have no idea what the motive is (I was not involved
with OMPI when this was decided or debated.)

Basically, without these five changes OpenIB is dead in 1.7.4 and beyond
for RC, XRC, and RoCEE. These are blockers to 1.7.4 and I don't believe
that the onus falls squarely on Mellanox to fix these. These were
community decisions and, as such, it must be a community effort to
resolve. We are happy to lend a hand, but we are not fixing all of this
mess.

I think that the 5 steps above sound correct and I agree that 1) this
means 1.7.4 is on hold until we fix this and 2) that Mellanox shouldn't be
the only one to fix this for 1.7.4, given the amount of work involved.

Ralph, what, specifically, broke about the oob/xoob cpc mechanisms by
making the oob asynchronous?
 
Hard for me to say as I don't really have access to an IB machine any more. Odin is my sole reference point, and someone has had that fully locked up for more than a week (and I can't complain as I am totally a guest there). Even then, I can only test on a few nodes.
 
I have no objection to helping, but we need someone who cares about IB and has access to such a machine to take the lead. Otherwise, we're just spinning our wheels.
 
As for the work issue: note that this has been "under development" now for more than a year. We've talked at length about how "somebody" needs to fix the openib/ofacm issue, but everyone keeps pushing it down the road as "not mine". Like I said, I can help - but (a) my boss couldn't care less about this issue, and (b) I have no way to test the results.
 
 


 That is, 1-5 are a huge amount of work; have
we done the analysis to say that updating the oob / xoob cpc to work with
the new oob is actually more work than doing 1-5?  Obviously, there's long
term plans that make oob/xoob problematic.  But those aren't 1.7 / 1.8
plans.  Unfortunately, the cpcs were always out of my area of interest, so
I'm flying a bit more blind than I'd like here.

Brian

--
 Brian W. Barrett
 Scalable System Software Group
 Sandia National Laboratories




_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel