Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.
From: Mike Dubman (miked_at_[hidden])
Date: 2014-03-04 09:04:53


Hi,

coll/hcoll is Mellanox driven collective package.
coll/ml is managed/supported/developed by ORNL folks.

On Tue, Mar 4, 2014 at 1:06 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Ummm...the "ml" stands for Mellanox. This is a component you folks
> contributed at some time. IIRC, the hcoll and/or bcol are meant to replace
> it, but you folks would know best what to do with it.
>
>
>
> On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina <elena.elkina_at_[hidden]>wrote:
>
>> Hi,
>>
>> Recently I often meet hangs and seg faults with different command lines
>> and there are "ml" functions in the stack trace.
>> When I just turn "ml" off by do -mca coll ^ml, problems disappear.
>> For example,
>> oshrun -np 4 --map-by node --display-map ./ring_oshmem
>> fails with seg fault while
>> oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
>> passes.
>>
>> The "ml" priority is low (27), but it could have issues during comm_query
>> (it does all initialization staff there).
>>
>> "Ml" is unreliable component. So It may be reasonable do not to build
>> this component by default to avoid such problems.
>>
>> What do you think?
>>
>> Best regards,
>> Elena
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Searchable archives:
>> http://www.open-mpi.org/community/lists/devel/2014/03/date.php
>>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Searchable archives:
> http://www.open-mpi.org/community/lists/devel/2014/03/date.php
>