Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-10-11 11:26:32


On Fri, Oct 05, 2007 at 09:43:44AM +0200, Jeff Squyres wrote:
> David --
>
> Gleb and I just actively re-looked at this problem yesterday; we
> think it's related to https://svn.open-mpi.org/trac/ompi/ticket/
> 1015. We previously thought this ticket was a different problem, but
> our analysis yesterday shows that it could be a real problem in the
> openib BTL or ob1 PML (kinda think it's the openib btl because it
> doesn't seem to happen on other networks, but who knows...).
>
> Gleb is investigating.
Here is the result of the investigation. The problem is different than
#1015 ticket. What we have here is one rank calls isend() of a small
message and wait_all() in a loop and another one calls irecv(). The
problem is that isend() usually doesn't call opal_progress() anywhere
and wait_all() doesn't call progress if all requests are already completed
so messages are never progressed. We may force opal_progress() to be called
by setting btl_openib_free_list_max to 1000. Then wait_all() will call
progress because not every request will be immediately completed by OB1. Or
we can limit a number of uncompleted requests that OB1 can allocate by setting
pml_ob1_free_list_max to 1000. Then opal_progress() will be called from a
free_list_wait() when max will be reached. The second option works much
faster for me.

>
>
>
> On Oct 5, 2007, at 12:59 AM, David Daniel wrote:
>
> > Hi Folks,
> >
> > I have been seeing some nasty behaviour in collectives,
> > particularly bcast and reduce. Attached is a reproducer (for bcast).
> >
> > The code will rapidly slow to a crawl (usually interpreted as a
> > hang in real applications) and sometimes gets killed with sigbus or
> > sigterm.
> >
> > I see this with
> >
> > openmpi-1.2.3 or openmpi-1.2.4
> > ofed 1.2
> > linux 2.6.19 + patches
> > gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)
> > 4 socket, dual core opterons
> >
> > run as
> >
> > mpirun --mca btl self,openib --npernode 1 --np 4 bcast-hang
> >
> > To my now uneducated eye it looks as if the root process is rushing
> > ahead and not progressing earlier bcasts.
> >
> > Anyone else seeing similar? Any ideas for workarounds?
> >
> > As a point of reference, mvapich2 0.9.8 works fine.
> >
> > Thanks, David
> >
> >
> > <bcast-hang.c>
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
			Gleb.