This was on v1.3 r19845.
Edgar Gabriel wrote:
> was this with the trunk or v1.3? If it was the trunk, was it before
> r19929 was applied? The reason I ask is because r19929 should remove
> all error messages related to 'non-existing communictors'. Hierarch
> btw. is not the cause for the error messages even before that, it just
> exposes it more frequently...
> Terry Dontje wrote:
>> I am seeing the message "Dropped message for the non-existing
>> communicator" when running hpcc with np=124 against r19845. This
>> seems to be pretty reproducible at np=124. When the job prints out
>> the message above some set of processes are in an MPI_Bcast and the
>> 15 processes reporting the message are stuck in MPI_Barrier.
>> I am not sure how related this is to #1408 since I am not invoking
>> the hierarchical collectives. I just wanted to see if anyone else
>> has tried to run hpcc at such an np size with any success.
>> My next steps are to try to run this with the latest trunk and to
>> narrow down the failing case.
>> devel mailing list