Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Random hangs using btl sm with OpenMPI 1.3.2/1.3.3 + gcc4.4?
From: Jonathan Dursi (ljdursi_at_[hidden])
Date: 2009-09-22 21:46:11

Hi, Jeff:

I wish I had your problems reproducing this. This problem apparently
rears its head when OpenMPI is compiled with the intel compilers, as
well, but only ~1% of the time. Unfortunately, we have users who
launch ~1400 single-node jobs at a go. So they see on order a dozen
or two jobs hang per suite of simulations when using the defaults, but
their problem goes away when they use -mca btl self,tcp, or when they
use sm but set the number of fifos to np-1.

At first I had assumed it was a new-ish-architecture thing, as we
first saw the problem on the Nehalem Xeon E5540 nodes, but the sample
program hangs in exactly the same way on a Harpertown (E5430) machine
as well. So I've been assuming that this is a real problem that for
whatever reason is just exposed more with this particular version of
this particular compiler. I'd love to be wrong and for it to be
something strange but easily changed in our environment that is
causing this.

Running with your suggested test change, eg
        leftneighbour = rank-1
        if (leftneighbour .eq. -1) then
! leftneighbour = nprocs-1
           leftneighbour = MPI_PROC_NULL
        rightneighbour = rank+1
        if (rightneighbour .eq. nprocs) then
! rightneighbour = 0
           rightneighbour = MPI_PROC_NULL

like so:
mpirun -np 6 -mca btl self,sm,tcp ./diffusion-mpi

I do seem to get different behaviour. With OpenMPI 1.3.2, the program
frequently runs to completion, but when it does so it hangs at the
end, which hadn't happened before -- attaching gdb to a process tells
me that it's hanging in mpi_finalize;
(gdb) where
#0 0x00002b3635ecb51f in poll () from /lib64/
#1 0x00002b3634bd87c1 in poll_dispatch () from /scinet/gpc/mpi/
#2 0x00002b3634bd7659 in opal_event_base_loop () from /scinet/gpc/mpi/
#3 0x00002b3634bcc189 in opal_progress () from /scinet/gpc/mpi/
#4 0x00002b3636d7cf15 in barrier () from /scinet/gpc/mpi/openmpi/
#5 0x00002b363470158b in ompi_mpi_finalize () from /scinet/gpc/mpi/
#6 0x00002b36344bb529 in pmpi_finalize__ () from /scinet/gpc/mpi/
#7 0x0000000000400f99 in MAIN__ ()
#8 0x0000000000400fda in main (argc=1, argv=0x7fff3e3908c8)
at ../../../gcc-4.4.0/libgfortran/fmain.c:21

The rest of the time (maybe 1/4 of the time?) it hangs mid-run, in
the sendrecv:
(gdb) where
#0 0x00002b2bb44b4230 in mca_pml_ob1_send () from /scinet/gpc/mpi/
#1 0x00002b2baf47d296 in PMPI_Sendrecv () from /scinet/gpc/mpi/
#2 0x00002b2baf215540 in pmpi_sendrecv__ () from /scinet/gpc/mpi/
#3 0x0000000000400ea6 in MAIN__ ()
#4 0x0000000000400fda in main (argc=1, argv=0x7fff62d9b9c8)
at ../../../gcc-4.4.0/libgfortran/fmain.c:21

When running with OpenMPI 1.3.3, I get hangs in the program
significantly _more_ often with this change than before, typically in
the sendrecv again

#0 0x00002aeb89d6cf2b in mca_btl_sm_component_progress () from /
#1 0x00002aeb849bd14a in opal_progress () from /scinet/gpc/mpi/
#2 0x00002aeb8954f235 in mca_pml_ob1_send () from /scinet/gpc/mpi/
#3 0x00002aeb84516586 in PMPI_Sendrecv () from /scinet/gpc/mpi/
#4 0x00002aeb842ae5b0 in pmpi_sendrecv__ () from /scinet/gpc/mpi/
#5 0x0000000000400ea6 in MAIN__ ()
#6 0x0000000000400fda in main (argc=1, argv=0x7fff12a13068)
at ../../../gcc-4.4.0/libgfortran/fmain.c:21

but again occasionally in the finalize, and (unlike with 1.3.2)
occasional successful runs through completion.

Again, running the program with both versions of openmpi without sm
mpirun -np 6 -mca btl self,tcp ./diffusion-mpi

or with num_fifos=(np-1):
mpirun -np 6 -mca btl self,sm -mca btl_sm_num_fifos 5 ./diffusion-mpi

seems to work fine.

        - Jonathan

On 2009-09-22, at 8:52PM, Jeff Squyres wrote:

> Johnathan --
> Sorry for the delay in replying; thanks for posting again.
> I'm actually unable to replicate your problem. :-( I have a new
> intel 8 core X5570 box; I'm running at np6 and np8 on both Open MPI
> 1.3.2 and 1.3.3 and am not seeing the problem you're seeing. I even
> made your sample program worse -- I made a and b be 100,000 element
> real arrays (increasing the count args in MPI_SENDRECV to 100,000 as
> well), and increased nsteps to 150,000,000. No hangs. :-\
> The version of the compiler *usually* isn't significant, so gcc 4.x
> should be fine.
> Yes, the sm flow control issue was a significant fix, but the
> blocking nature of MPI_SENDRECV means that you shouldn't have run
> into the problems that were fixed (the main issues had to do with
> fast senders exhausting resources of slow receivers -- but
> MPI_SENDRECV is synchronous so the senders should always be matching
> the speed of the receivers).
> Just for giggles, what happens if you change
> if (leftneighbour .eq. -1) then
> leftneighbour = nprocs-1
> endif
> if (rightneighbour .eq. nprocs) then
> rightneighbour = 0
> endif
> to
> if (leftneighbour .eq. -1) then
> leftneighbour = MPI_PROC_NULL
> endif
> if (rightneighbour .eq. nprocs) then
> rightneighbour = MPI_PROC_NULL
> endif
> On Sep 21, 2009, at 5:09 PM, Jonathan Dursi wrote:
>> Continuing the conversation with myself:
>> Google pointed me to Trac ticket #1944, which spoke of deadlocks in
>> looped collective operations; there is no collective operation
>> anywhere in this sample code, but trying one of the suggested
>> workarounds/clues: that is, setting btl_sm_num_fifos to at least
>> (np-1) seems to make things work quite reliably, for both OpenMPI
>> 1.3.2 and 1.3.3; that is, while this
>> mpirun -np 6 -mca btl sm,self ./diffusion-mpi
>> invariably hangs (at random-seeming numbers of iterations) with
>> OpenMPI 1.3.2 and sometimes hangs (maybe 10% of the time, again
>> seemingly randomly) with 1.3.3,
>> mpirun -np 6 -mca btl tcp,self ./diffusion-mpi
>> or
>> mpirun -np 6 -mca btl_sm_num_fifos 5 -mca btl sm,self ./diffusion-mpi
>> always succeeds, with (as one might guess) the second being much
>> faster...
>> Jonathan
>> --
>> Jonathan Dursi <ljdursi_at_[hidden]>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jonathan Dursi <ljdursi_at_[hidden]>