Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-10-11 11:26:32


On Fri, Oct 05, 2007 at 09:43:44AM +0200, Jeff Squyres wrote:
> David --
>
> Gleb and I just actively re-looked at this problem yesterday; we
> think it's related to https://svn.open-mpi.org/trac/ompi/ticket/
> 1015. We previously thought this ticket was a different problem, but
> our analysis yesterday shows that it could be a real problem in the
> openib BTL or ob1 PML (kinda think it's the openib btl because it
> doesn't seem to happen on other networks, but who knows...).
>
> Gleb is investigating.
Here is the result of the investigation. The problem is different than
#1015 ticket. What we have here is one rank calls isend() of a small
message and wait_all() in a loop and another one calls irecv(). The
problem is that isend() usually doesn't call opal_progress() anywhere
and wait_all() doesn't call progress if all requests are already completed
so messages are never progressed. We may force opal_progress() to be called
by setting btl_openib_free_list_max to 1000. Then wait_all() will call
progress because not every request will be immediately completed by OB1. Or
we can limit a number of uncompleted requests that OB1 can allocate by setting
pml_ob1_free_list_max to 1000. Then opal_progress() will be called from a
free_list_wait() when max will be reached. The second option works much
faster for me.

>
>
>
> On Oct 5, 2007, at 12:59 AM, David Daniel wrote:
>
> > Hi Folks,
> >
> > I have been seeing some nasty behaviour in collectives,
> > particularly bcast and reduce. Attached is a reproducer (for bcast).
> >
> > The code will rapidly slow to a crawl (usually interpreted as a
> > hang in real applications) and sometimes gets killed with sigbus or
> > sigterm.
> >
> > I see this with
> >
> > openmpi-1.2.3 or openmpi-1.2.4
> > ofed 1.2
> > linux 2.6.19 + patches
> > gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)
> > 4 socket, dual core opterons
> >
> > run as
> >
> > mpirun --mca btl self,openib --npernode 1 --np 4 bcast-hang
> >
> > To my now uneducated eye it looks as if the root process is rushing
> > ahead and not progressing earlier bcasts.
> >
> > Anyone else seeing similar? Any ideas for workarounds?
> >
> > As a point of reference, mvapich2 0.9.8 works fine.
> >
> > Thanks, David
> >
> >
> > <bcast-hang.c>
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
			Gleb.