Ashley Pittman wrote:
> Do you have a stack trace of your hung application to hand, in
> particular when you say "All
> processes have made the same call to MPI_Allreduce. The processes are
> all in opal_progress, called (with intervening calls) by MPI_Allreduce."
> do the intervening calls include mca_coll_sync_bcast
> ompi_coll_tuned_barrier_intra_dec_fixed and
> ompi_coll_tuned_barrier_intra_recursivedoubling?
I don't have a stack trace handy, and today is pretty full. I'll try
and make some time to document what I've got in the next few days. I
was able to hang a C translation of Ralph's reproducer as well.
- Bryan
--
Bryan Lally, lally_at_[hidden]
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico
|