Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Deadlock with comm_create since cid allocator change
From: Sylvain Jeaugey (sylvain.jeaugey_at_[hidden])
Date: 2009-09-21 08:07:42


You were faster to fix the bug than I was to send my bug report :-)

So I confirm : this fixes the problem.

Thanks !
Sylvain

On Mon, 21 Sep 2009, Edgar Gabriel wrote:

> what version of OpenMPI did you use? Patch #21970 should have fixed this
> issue on the trunk...
>
> Thanks
> Edgar
>
> Sylvain Jeaugey wrote:
>> Hi list,
>>
>> We are currently experiencing deadlocks when using communicators other than
>> MPI_COMM_WORLD. So we made a very simple reproducer (Comm_create then
>> MPI_Barrier on the communicator - see end of e-mail).
>>
>> We can reproduce the deadlock only with openib and with at least 8 cores
>> (no success with sm) and after ~20 runs average. Using larger number of
>> cores greatly increases the occurence of the deadlock. When the deadlock
>> occurs, every even process is stuck in MPI_Finalize and every odd process
>> is in MPI_Barrier.
>>
>> So we tracked the bug in the changesets and found out that this patch seem
>> to have introduced the bug :
>>
>> user: brbarret
>> date: Tue Aug 25 15:13:31 2009 +0000
>> summary: Per discussion in ticket #2009, temporarily disable the block
>> CID allocation
>> algorithms until they properly reuse CIDs.
>>
>> Reverting to the non multi-thread cid allocator makes the deadlock
>> disappear.
>>
>> I tried to dig further and understand why this makes a difference, with no
>> luck.
>>
>> If anyone can figure out what's happening, that would be great ...
>>
>> Thanks,
>> Sylvain
>>
>> #include <mpi.h>
>> #include <stdio.h>
>>
>> int main(int argc, char **argv) {
>> int rank, numTasks;
>> int range[3];
>> MPI_Comm testComm, dupComm;
>> MPI_Group orig_group, new_group;
>>
>> MPI_Init(&argc, &argv);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Comm_size(MPI_COMM_WORLD, &numTasks);
>> MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
>> range[0] = 0; /* first rank */
>> range[1] = numTasks - 1; /* last rank */
>> range[2] = 1; /* stride */
>> MPI_Group_range_incl(orig_group, 1, &range, &new_group);
>> MPI_Comm_create(MPI_COMM_WORLD, new_group, &testComm);
>> MPI_Barrier(testComm);
>> MPI_Finalize();
>> return 0;
>> }
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> Edgar Gabriel
> Assistant Professor
> Parallel Software Technologies Lab http://pstl.cs.uh.edu
> Department of Computer Science University of Houston
> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>