Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Deadlock with comm_create since cid allocator change
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2009-09-21 08:01:39


what version of OpenMPI did you use? Patch #21970 should have fixed this
issue on the trunk...

Thanks
Edgar

Sylvain Jeaugey wrote:
> Hi list,
>
> We are currently experiencing deadlocks when using communicators other
> than MPI_COMM_WORLD. So we made a very simple reproducer (Comm_create
> then MPI_Barrier on the communicator - see end of e-mail).
>
> We can reproduce the deadlock only with openib and with at least 8 cores
> (no success with sm) and after ~20 runs average. Using larger number of
> cores greatly increases the occurence of the deadlock. When the deadlock
> occurs, every even process is stuck in MPI_Finalize and every odd
> process is in MPI_Barrier.
>
> So we tracked the bug in the changesets and found out that this patch
> seem to have introduced the bug :
>
> user: brbarret
> date: Tue Aug 25 15:13:31 2009 +0000
> summary: Per discussion in ticket #2009, temporarily disable the
> block CID allocation
> algorithms until they properly reuse CIDs.
>
> Reverting to the non multi-thread cid allocator makes the deadlock
> disappear.
>
> I tried to dig further and understand why this makes a difference, with
> no luck.
>
> If anyone can figure out what's happening, that would be great ...
>
> Thanks,
> Sylvain
>
> #include <mpi.h>
> #include <stdio.h>
>
> int main(int argc, char **argv) {
> int rank, numTasks;
> int range[3];
> MPI_Comm testComm, dupComm;
> MPI_Group orig_group, new_group;
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &numTasks);
> MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
> range[0] = 0; /* first rank */
> range[1] = numTasks - 1; /* last rank */
> range[2] = 1; /* stride */
> MPI_Group_range_incl(orig_group, 1, &range, &new_group);
> MPI_Comm_create(MPI_COMM_WORLD, new_group, &testComm);
> MPI_Barrier(testComm);
> MPI_Finalize();
> return 0;
> }
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335