Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7.5 end-of-week status report
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-03-16 11:19:32


On Mar 15, 2014, at 10:19 PM, Hjelm, Nathan T <hjelmn_at_[hidden]> wrote:

> On Friday, March 14, 2014 8:48 PM, devel [devel-bounces_at_[hidden]] on behalf of Ralph Castain [rhc_at_[hidden]] wrote:
>> To: Open MPI Developers
>> Subject: [OMPI devel] 1.7.5 end-of-week status report
>>
>> Hi folks
>>
>> I have both good and bad news to report - first the good.
>>
>> OSHMEM now passes nearly all its tests on my Linux cluster (tcp). My hat is off to the Mellanox guys for getting this done, including getting our MTT repo tests complete.
>>
>> The MPI layer passes nearly all the IBM, Intel, and one-sided tests. Only a few failures.
>>
>> Now the bad. The coll/ml component continues to have problems, including segfaults, and I have discovered that the bcol and coll/ml code remains entangled (I thought it had been separated, but sadly not). I have therefore ompi_ignored coll/ml and bcol/ptpcoll.
>
> No need. I discovered a bug in my last coll/ml fix. It incorrectly handled one of the possibly hierarchies. The bug is fixed in trunk and a CMR is open for 1.7.5. In the future I will clean up this path but the fix should have us working again.

I'm glad you were able to patch it, but this still begs the question of what to do with coll/ml. It's disturbing that its existence alone was enough to break the Java bindings (and yes, I concede those aren't built by default or part of the MPI standard) without even traversing its code path, and we've had a lot of problems with errors when we do go thru it. More disturbing, you can't even cleanly no-build that component due to the unfortunate cross-linkage with bcol/ptpcoll, so we definitely need a note in NEWS to warn people they need to no-build both.

It's unclear to me how to handle this situation, so we'll need to discuss it at the telecon. At the very least, I think we need to ensure coll/ml is not the default for 1.7.5 as it doesn't appear to be ready for that role.

>
> -Nathan
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/03/14352.php