Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Random-ish hangs using btl sm with OpenMPI 1.3.2 + gcc4.4?
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-10-02 16:17:46


Jonathan Dursi wrote:

> We have here installed a couple of installations of OpenMPI 1.3.2, and
> we are having real problems with single-node jobs randomly hanging
> when using the shared memory BTL, particularly (but perhaps not only)
> when using the version compiled with gcc 4.4.0.
>
> The very trivial attached program, which just does a series of
> SENDRECVs rightwards through MPI_COMM_WORLD, hangs extremely
> reliably when run like so on an 8 core box:
>
> mpirun -np 6 -mca btl self,sm ./diffusion-mpi
>
> (the test example was based on a simple fortran example of MPIing the
> 1d diffusion equation). The hanging seems to always occur within the
> first 500 or so iterations - but sometimes between the 10th and 20th
> and sometimes not until the late 400s. The hanging occurs both on a
> new dual socket quad core nehalem box, and an older harpertown machine.
>
> Running without sm, however, seems to work fine:
>
> mpirun -np 6 -mca btl self,tcp ./diffusion-mpi
>
> never gives any problems.
>
> Any suggestions? I notice a mention of `improved flow control in sm'
> in the ChangeLog to 1.3.3; is that a significant bugfix?

I filed a trac ticket on this.

https://svn.open-mpi.org/trac/ompi/ticket/2043