Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2014-03-04 20:15:09


I am still seeing the same issue where I get some type of segv unless I disable the coll ml component. This may be an issue at my end, but just thought I would double check that we are sure this is fixed.
Thanks,
Rolf

>-----Original Message-----
>From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Hjelm,
>Nathan T
>Sent: Tuesday, March 04, 2014 2:34 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different
>command lines.
>
>There was a rounding issue in basesmuma. If the control data happened to be
>less than a page then we were trying to allocate 0 bytes. It should be fixed on
>the trunk and has been CMR'ed to 1.7.5
>
>-Nathan
>
>Please excuse the horrible Outlook-style quoting. OWA sucks.
>
>________________________________________
>From: devel [devel-bounces_at_[hidden]] on behalf of Mike Dubman
>[miked_at_[hidden]]
>Sent: Tuesday, March 04, 2014 7:04 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different
>command lines.
>
>Hi,
>
>coll/hcoll is Mellanox driven collective package.
>coll/ml is managed/supported/developed by ORNL folks.
>
>
>On Tue, Mar 4, 2014 at 1:06 PM, Ralph Castain <rhc_at_open-
>mpi.org<mailto:rhc_at_[hidden]>> wrote:
>Ummm...the "ml" stands for Mellanox. This is a component you folks
>contributed at some time. IIRC, the hcoll and/or bcol are meant to replace it,
>but you folks would know best what to do with it.
>
>
>
>On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina
><elena.elkina_at_[hidden]<mailto:elena.elkina_at_[hidden]>> wrote:
>Hi,
>
>Recently I often meet hangs and seg faults with different command lines and
>there are "ml" functions in the stack trace.
>When I just turn "ml" off by do -mca coll ^ml, problems disappear.
>For example,
>oshrun -np 4 --map-by node --display-map ./ring_oshmem fails with seg fault
>while oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
>passes.
>
>The "ml" priority is low (27), but it could have issues during comm_query (it
>does all initialization staff there).
>
>"Ml" is unreliable component. So It may be reasonable do not to build this
>component by default to avoid such problems.
>
>What do you think?
>
>Best regards,
>Elena
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]<mailto:devel_at_[hidden]>
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Searchable archives: http://www.open-
>mpi.org/community/lists/devel/2014/03/date.php
>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]<mailto:devel_at_[hidden]>
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Searchable archives: http://www.open-
>mpi.org/community/lists/devel/2014/03/date.php
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Searchable archives: http://www.open-
>mpi.org/community/lists/devel/2014/03/index.php
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------