Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] freezing in mpi_allreduce operation
From: Greg Fischer (greg.a.fischer_at_[hidden])
Date: 2011-09-08 17:59:56


Note also that coding the mpi_allreduce as:

   call
mpi_allreduce(MPI_IN_PLACE,phim(0,1,1,1,grp),phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)

results in the same freezing behavior in the 60th iteration. (I don't
recall why the arrays were being passed, possibly just a mistake.)

On Thu, Sep 8, 2011 at 4:17 PM, Greg Fischer <greg.a.fischer_at_[hidden]>wrote:

> I am seeing mpi_allreduce operations freeze execution of my code on some
> moderately-sized problems. The freeze does not manifest itself in every
> problem. In addition, it is in a portion of the code that is repeated many
> times. In the problem discussed below, the problem appears in the 60th
> iteration.
>
> The current test case that I'm looking at is a 64-processor job. This
> particular mpi_allreduce call applies to all 64 processors, with each
> communicator in the call containing a total of 4 processors. When I add
> print statements before and after the offending line, I see that all 64
> processors successfully make it to the mpi_allreduce call, but only 32
> successfully exit. Stack traces on the other 32 yield something along the
> lines of the trace listed at the bottom of this message. The call, itself,
> looks like:
>
> call mpi_allreduce(MPI_IN_PLACE,
> phim(0:(phim_size-1),1:im,1:jm,1:kmloc(coords(2)+1),grp), &
>
> phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
>
> These messages are sized to remain under the 32-bit integer size limitation
> for the "count" parameter. The intent is to perform the allreduce operation
> on a contiguous block of the array. Previously, I had been passing an
> assumed-shape array (i.e. phim(:,:,:,:,grp), but found some documentation
> indicating that was potentially dangerous. Making the change from assumed-
> to explicit-shaped arrays doesn't solve the problem. However, if I declare
> an additional array and use separate send and receive buffers:
>
> call
> mpi_allreduce(phim_local,phim_global,phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
> phim(:,:,:,:,grp) = phim_global
>
> Then the problem goes away, and every thing works normally. Does anyone
> have any insight as to what may be happening here? I'm using "include
> 'mpif.h'" rather than the f90 module, does that potentially explain this?
>
> Thanks,
> Greg
>
> Stack trace(s) for thread: 1
> -----------------
> [0] (1 processes)
> -----------------
> main() at ?:?
> solver() at solver.f90:31
> solver_q_down() at solver_q_down.f90:52
> iter() at iter.f90:56
> mcalc() at mcalc.f90:38
> pmpi_allreduce__() at ?:?
> PMPI_Allreduce() at ?:?
> ompi_coll_tuned_allreduce_intra_dec_fixed() at ?:?
> ompi_coll_tuned_allreduce_intra_ring_segmented() at ?:?
> ompi_coll_tuned_sendrecv_actual() at ?:?
> ompi_request_default_wait_all() at ?:?
> opal_progress() at ?:?
> Stack trace(s) for thread: 2
> -----------------
> [0] (1 processes)
> -----------------
> start_thread() at ?:?
> btl_openib_async_thread() at ?:?
> poll() at ?:?
> Stack trace(s) for thread: 3
> -----------------
> [0] (1 processes)
> -----------------
> start_thread() at ?:?
> service_thread_start() at ?:?
> select() at ?:?
>