coll/hcoll is Mellanox driven collective package.
coll/ml is managed/supported/developed by ORNL folks.
On Tue, Mar 4, 2014 at 1:06 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> Ummm...the "ml" stands for Mellanox. This is a component you folks
> contributed at some time. IIRC, the hcoll and/or bcol are meant to replace
> it, but you folks would know best what to do with it.
> On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina <elena.elkina_at_[hidden]>wrote:
>> Recently I often meet hangs and seg faults with different command lines
>> and there are "ml" functions in the stack trace.
>> When I just turn "ml" off by do -mca coll ^ml, problems disappear.
>> For example,
>> oshrun -np 4 --map-by node --display-map ./ring_oshmem
>> fails with seg fault while
>> oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
>> The "ml" priority is low (27), but it could have issues during comm_query
>> (it does all initialization staff there).
>> "Ml" is unreliable component. So It may be reasonable do not to build
>> this component by default to avoid such problems.
>> What do you think?
>> Best regards,
>> devel mailing list
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Searchable archives:
> devel mailing list
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Searchable archives: