On 11/14/13 11:16 AM, "Jeff Squyres (jsquyres)" <firstname.lastname@example.org
On Nov 14, 2013, at 1:03 PM, Ralph Castain <email@example.com> wrote:
1) What the status of UDCM is (does it work reliably, does it support
Seems to be working okay on the IB systems at LANL and IU. Don't know
about XRC - I seem to recall the answer is "no"
FWIW, I recall that when Cisco was testing UDCM (a long time ago --
before we threw away our IB gear...), we found bugs in UDCM that only
showed up with really large numbers of MTT tests running UDCM (i.e., 10K+
tests a night, especially with lots of UDCM-based jobs running
concurrently on the same cluster). These types of bugs didn't show up in
Has that happened with the new/fixed UDCM? Cisco is no longer in a
position to test this.
Neither are we at Sandia, unfortunately. I only have 16 nodes for nightly
testing, and only 8 of those are always running Linux, so that doesn't
help much on the stress test.
2) What's the difference between CPCs and OFACM and what's our plans
w.r.t 1.7 there?
Pasha created ofacm because some of the collective components now need
to forge connections. So he created the common/ofacm code to meet those
needs, with the intention of someday replacing the openib cpc's with the
new common code. However, this was stalled by the iWarp issue, and so it
fell off the table.
We now have two duplicate ways of doing the same thing, but with code
in two different places. :-(
FWIW, the iWARP vendors have repeatedly been warned that ofacm is going
to take over, and unless they supply patches, iWarp will stop working in
Open MPI. I know for a fact that they are very aware of this.
So my $0.02 is that ofacm should take over -- let's get rid of CPC and
have openib use the ofacm. The iWarp folks can play catch up if/when
they want to.
Of course, I'm not in this part of the code base any more, so it's not
really my call -- just my $0.02...
Of course, that doesn't help with the core issue; we can't have a
regression w.r.t XRC support between 1.7.3 and 1.7.4. But I agree, I'm
fine with only fixing this in one place.
Brian W. Barrett
Scalable System Software Group
Sandia National Laboratories
devel mailing firstname.lastname@example.org://www.open-mpi.org/mailman/listinfo.cgi/devel