Terry,
was this with the trunk or v1.3? If it was the trunk, was it before
r19929 was applied? The reason I ask is because r19929 should remove all
error messages related to 'non-existing communictors'. Hierarch btw. is
not the cause for the error messages even before that, it just exposes
it more frequently...
Thanks
Edgar
Terry Dontje wrote:
> I am seeing the message "Dropped message for the non-existing
> communicator" when running hpcc with np=124 against r19845. This seems
> to be pretty reproducible at np=124. When the job prints out the
> message above some set of processes are in an MPI_Bcast and the 15
> processes reporting the message are stuck in MPI_Barrier.
> I am not sure how related this is to #1408 since I am not invoking the
> hierarchical collectives. I just wanted to see if anyone else has tried
> to run hpcc at such an np size with any success.
>
> My next steps are to try to run this with the latest trunk and to narrow
> down the failing case.
>
> --td
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
|