Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] 1.7.5 end-of-week status report
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-03-16 11:19:32


On Mar 15, 2014, at 10:19 PM, Hjelm, Nathan T <hjelmn_at_[hidden]> wrote:

> On Friday, March 14, 2014 8:48 PM, devel [devel-bounces_at_[hidden]] on behalf of Ralph Castain [rhc_at_[hidden]] wrote:
>> To: Open MPI Developers
>> Subject: [OMPI devel] 1.7.5 end-of-week status report
>>
>> Hi folks
>>
>> I have both good and bad news to report - first the good.
>>
>> OSHMEM now passes nearly all its tests on my Linux cluster (tcp). My hat is off to the Mellanox guys for getting this done, including getting our MTT repo tests complete.
>>
>> The MPI layer passes nearly all the IBM, Intel, and one-sided tests. Only a few failures.
>>
>> Now the bad. The coll/ml component continues to have problems, including segfaults, and I have discovered that the bcol and coll/ml code remains entangled (I thought it had been separated, but sadly not). I have therefore ompi_ignored coll/ml and bcol/ptpcoll.
>
> No need. I discovered a bug in my last coll/ml fix. It incorrectly handled one of the possibly hierarchies. The bug is fixed in trunk and a CMR is open for 1.7.5. In the future I will clean up this path but the fix should have us working again.

I'm glad you were able to patch it, but this still begs the question of what to do with coll/ml. It's disturbing that its existence alone was enough to break the Java bindings (and yes, I concede those aren't built by default or part of the MPI standard) without even traversing its code path, and we've had a lot of problems with errors when we do go thru it. More disturbing, you can't even cleanly no-build that component due to the unfortunate cross-linkage with bcol/ptpcoll, so we definitely need a note in NEWS to warn people they need to no-build both.

It's unclear to me how to handle this situation, so we'll need to discuss it at the telecon. At the very least, I think we need to ensure coll/ml is not the default for 1.7.5 as it doesn't appear to be ready for that role.

>
> -Nathan
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/03/14352.php