Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-03-04 06:06:54


Ummm...the "ml" stands for Mellanox. This is a component you folks
contributed at some time. IIRC, the hcoll and/or bcol are meant to replace
it, but you folks would know best what to do with it.

On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina <elena.elkina_at_[hidden]>wrote:

> Hi,
>
> Recently I often meet hangs and seg faults with different command lines
> and there are "ml" functions in the stack trace.
> When I just turn "ml" off by do -mca coll ^ml, problems disappear.
> For example,
> oshrun -np 4 --map-by node --display-map ./ring_oshmem
> fails with seg fault while
> oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
> passes.
>
> The "ml" priority is low (27), but it could have issues during comm_query
> (it does all initialization staff there).
>
> "Ml" is unreliable component. So It may be reasonable do not to build this
> component by default to avoid such problems.
>
> What do you think?
>
> Best regards,
> Elena
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Searchable archives:
> http://www.open-mpi.org/community/lists/devel/2014/03/date.php
>