Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.
From: Hjelm, Nathan T (hjelmn_at_[hidden])
Date: 2014-03-04 14:34:17


There was a rounding issue in basesmuma. If the control data happened to be less than a page then we were trying to allocate 0 bytes. It should be fixed on the trunk and has been CMR'ed to 1.7.5

-Nathan

Please excuse the horrible Outlook-style quoting. OWA sucks.

________________________________________
From: devel [devel-bounces_at_[hidden]] on behalf of Mike Dubman [miked_at_[hidden]]
Sent: Tuesday, March 04, 2014 7:04 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.

Hi,

coll/hcoll is Mellanox driven collective package.
coll/ml is managed/supported/developed by ORNL folks.

On Tue, Mar 4, 2014 at 1:06 PM, Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]>> wrote:
Ummm...the "ml" stands for Mellanox. This is a component you folks contributed at some time. IIRC, the hcoll and/or bcol are meant to replace it, but you folks would know best what to do with it.

On Tue, Mar 4, 2014 at 12:12 AM, Elena Elkina <elena.elkina_at_[hidden]<mailto:elena.elkina_at_[hidden]>> wrote:
Hi,

Recently I often meet hangs and seg faults with different command lines and there are "ml" functions in the stack trace.
When I just turn "ml" off by do -mca coll ^ml, problems disappear.
For example,
oshrun -np 4 --map-by node --display-map ./ring_oshmem
fails with seg fault while
oshrun -np 4 --map-by node --display-map -mca coll ^ml ./ring_oshmem
passes.

The "ml" priority is low (27), but it could have issues during comm_query (it does all initialization staff there).

"Ml" is unreliable component. So It may be reasonable do not to build this component by default to avoid such problems.

What do you think?

Best regards,
Elena

_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Searchable archives: http://www.open-mpi.org/community/lists/devel/2014/03/date.php

_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Searchable archives: http://www.open-mpi.org/community/lists/devel/2014/03/date.php