Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Allreduce hangs
From: Martin Siegert (siegert_at_[hidden])
Date: 2012-05-04 02:01:30


On Tue, Apr 24, 2012 at 04:19:31PM -0400, Brock Palen wrote:
> To throw in my $0.02, though it is worth less.
>
> Were you running this on verb based infiniband?

Correct.

> We see a problem that we have a work around for even with the newest 1.4.5
> only on IB, we can reproduce it with IMB.

I can now confirm that the program hangs with 1.4.5 as well at exactly the same
point.
Any chance that this has to do with the default settings for the
btl_openib_max_eager_rdma and mpi_leave_pinned mca parameters? I.e.,
should I try to run the program with
--mca btl_openib_max_eager_rdma 0 --mca mpi_leave_pinned 0

> You can find an old thread from me about it. Your problem might not be the same.
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> brockp_at_[hidden]
> (734)936-1985

This one?
http://www.open-mpi.org/community/lists/users/2011/07/16996.php

- Martin

> On Apr 24, 2012, at 3:09 PM, Jeffrey Squyres wrote:
>
> > Could you repeat your tests with 1.4.5 and/or 1.5.5?
> >
> >
> > On Apr 23, 2012, at 1:32 PM, Martin Siegert wrote:
> >
> >> Hi,
> >>
> >> I am debugging a program that hangs in MPI_Allreduce (openmpi-1.4.3).
> >> An strace of one of the processes shows:
> >>
> >> Process 10925 attached with 3 threads - interrupt to quit
> >> [pid 10927] poll([{fd=17, events=POLLIN}, {fd=16, events=POLLIN}], 2, -1 <unfini
> >> shed ...>
> >> [pid 10926] select(15, [8 14], [], NULL, NULL <unfinished ...>
> >> [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO
> >> LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
> >> [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO
> >> LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
> >> [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO
> >> LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
> >> [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO
> >> LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
> >> ...
> >>
> >> The program is a Fortran program using 64bit integers (compiled with -i8)
> >> and I correspondingly compiled openmpi (version 1.4.3) with -i8 for
> >> the Fortran compiler as well.
> >>
> >> The program is somewhat difficult to debug since it takes 3 days to reach
> >> the point where it hangs. This is what I found so far:
> >>
> >> MPI_Allreduce is called as
> >>
> >> call MPI_Allreduce(MPI_IN_PLACE, recvbuf, count, MPI_DOUBLE_PRECISION, &
> >> MPI_SUM, MPI_COMM_WORLD, mpierr)
> >>
> >> with count = 455295488. Since the Fortran interface just calls the
> >> C routines in OpenMPI and count variables are 32bit integers in C I started
> >> to wonder what is the largest integer "count" for which a MPI_Allreduce
> >> succeeds. E.g., in MPICH (it has been a while that I looked into this, i.e.,
> >> this may or may not be correct anymore) all send/recv were converted
> >> into send/recv of MPI_BYTE, thus the largest count for doubles was
> >> (2^31-1)/8 = 268435455. Thus, I started to wrap the MPI_Allreduce call
> >> with a myMPI_Allreduce routine that repeatedly calls MPI_Allreduce when
> >> the count is larger than some value maxallreduce (the myMPI_Allreduce.f90
> >> is attached). I have tested the routine with a trivial program that
> >> just fills an array with numbers and calls myMPI_Allreduce and this
> >> test succeeds.
> >> However, with the real program the situations is very strange:
> >> When I set maxallreduce = 268435456, the program hangs at the first call
> >> (iallreduce = 1) to MPI_Allreduce in the do loop
> >>
> >> do iallreduce = 1, nallreduce - 1
> >> idx = (iallreduce - 1)*length + 1
> >> call MPI_Allreduce(MPI_IN_PLACE, recvbuf(idx), length, &
> >> datatype, op, comm, mpierr)
> >> if (mpierr /= MPI_SUCCESS) return
> >> end do
> >>
> >> With maxallreduce = 134217728 the first call succeeds, the second hangs.
> >> For maxallreduce = 67108864, the first two calls to MPI_Allreduce complete,
> >> but the third (iallreduce = 3) hangs. For maxallreduce = 8388608 the
> >> 17th call hangs, for 1048576 the 138th call hangs; here is a table
> >> (values from gdb attached to process 0 when the program hangs):
> >>
> >> maxallreduce iallreduce idx length
> >> 268435456 1 1 227647744
> >> 134217728 2 113823873 113823872
> >> 67108864 3 130084427 65042213
> >> 8388608 17 137447697 8590481
> >> 1048576 138 143392010 1046657
> >>
> >> As if there is (are) some element(s) in the middle of the array with
> >> idx >= 143392010 that cannot be sent or recv'd.
> >>
> >> Has anybody seen this kind of behaviour?
> >> Has anybody an idea what could be causing this?
> >> Ideas how to get around this?
> >> Anything that could help would be appreciated ... I already spent a
> >> huge amount of time on this and I am running out of ideas.
> >>
> >> Cheers,
> >> Martin
> >>
> >> --
> >> Martin Siegert
> >> Simon Fraser University
> >> Burnaby, British Columbia
> >> Canada
> >> <myMPI_Allreduce.f90>_______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Martin Siegert
Head, Research Computing
WestGrid/ComputeCanada Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: siegert_at_[hidden]
Canada  V5A 1S6