Thanks for digging into this. Can you file a bug? Let's mark it for
I say 1.3.1 instead of 1.3.0 because this *only* affects hierarch, and
since hierarch isn't currently selected by default (you must
specifically elevate hierarch's priority to get it to run), there's no
danger that users will run into this problem in default runs.
But clearly the problem needs to be fixed, and therefore we need a bug
to track it.
On Jan 13, 2009, at 2:09 PM, Edgar Gabriel wrote:
> I just debugged the Reduce_scatter bug mentioned previously. The bug
> is unfortunately not in hierarch, but in tuned.
> Here is the code snipplet causing the problems:
> int reduce_scatter (...., mca_coll_base_module_t *module)
> err = comm->c_coll.coll_reduce (...., module)
> but should be
> err = comm->c_coll.coll_reduce (..., comm-
> The problem as it is right now is, that when using hierarch, only a
> subset of the function are set, e.g. reduce,allreduce, bcast and
> barrier. Thus, reduce_scatter is from tuned in most scenarios, and
> calls the subsequent functions with the wrong module. Hierarch of
> course does not like that :-)
> Anyway, a quick glance through the tuned code reveals a significant
> number of instances where this appears(reduce_scatter, allreduce,
> allgather, allgatherv). Basic, hierarch and inter seem to do that
> mostly correctly.
> Edgar Gabriel
> Assistant Professor
> Parallel Software Technologies Lab http://pstl.cs.uh.edu
> Department of Computer Science University of Houston
> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
> devel mailing list