Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Inherent limit on #communicators?
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2009-05-01 08:08:09


David,

is this code representative for what your app is doing? E.g. you have a
base communicator (e.g. MPI_COMM_WORLD) which is being 'split', freed
again, split, freed again etc. ? i.e. the important aspect is that the
same 'base' communicator is being used for deriving new communicators
again and again?

The reason I ask is two-fold: one, you would in that case be one of the
ideal beneficiaries of the block cid algorithm :-) (even if it fails you
right now); two, a fix for this scenario which basically tries to reuse
the last block used (and which would fix your case if the condition is
true) is roughly five lines of code. This would give us the possibility
to have a fix quickly in the trunk and v1.3 (keep in mind that the
block-cid code is in the trunk since two years and this is the first
problem that we have) and give us more time to develop a profound
solution for the worst case - a chain of communicators being created,
e.g. communicator 1 is basis to derive a new comm 2, comm 2 is being
used to derive comm 3 etc.

Thanks
Edgar

David Gunter wrote:
> Here is the test code reproducer:
>
> program test2
> implicit none
> include 'mpif.h'
> integer ierr, myid, numprocs,i1,i2,n,local_comm,
> $ icolor,ikey,rank,root
>
> c
> c... MPI set-up
> ierr = 0
> call MPI_INIT(IERR)
> ierr = 1
> CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
> print *, ierr
>
> ierr = -1
>
> CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
>
> ierr = -5
> i1 = ierr
> if (myid.eq.0) i1 = 1
> call mpi_allreduce(i1, i2, 1,MPI_integer,MPI_MIN,
> $ MPI_COMM_WORLD,ierr)
>
> ikey = myid
> if (mod(myid,2).eq.0) then
> icolor = 0
> else
> icolor = MPI_UNDEFINED
> end if
>
> root = 0
> do n = 1, 100000
>
> call MPI_COMM_SPLIT(MPI_COMM_WORLD, icolor,
> $ ikey, local_comm, ierr)
>
> if (mod(myid,2).eq.0) then
> CALL MPI_COMM_RANK(local_comm, rank, ierr)
> i2 = i1
> call mpi_reduce(i1, i2, 1,MPI_integer,MPI_MIN,
> $ root, local_comm,ierr)
>
> if (myid.eq.0.and.mod(n,10).eq.0)
> $ print *, n, i1, i2,icolor,ikey
>
> call mpi_comm_free(local_comm, ierr)
> end if
>
> end do
> c if (icolor.eq.0) call mpi_comm_free(local_comm, ierr)
>
>
>
> call MPI_barrier(MPi_COMM_WORLD,ierr)
>
> call MPI_FINALIZE(IERR)
>
> print *, myid, ierr
>
> end
>
>
>
> -david
> --
> David Gunter
> HPC-3: Parallel Tools Team
> Los Alamos National Laboratory
>
>
>
> On Apr 30, 2009, at 12:43 PM, David Gunter wrote:
>
>> Just to throw out more info on this, the test code runs fine on
>> previous versions of OMPI. It only hangs on the 1.3 line when the cid
>> reaches 65536.
>>
>> -david
>> --
>> David Gunter
>> HPC-3: Parallel Tools Team
>> Los Alamos National Laboratory
>>
>>
>>
>> On Apr 30, 2009, at 12:28 PM, Edgar Gabriel wrote:
>>
>>> cid's are in fact not recycled in the block algorithm. The problem is
>>> that comm_free is not collective, so you can not make any assumptions
>>> whether other procs have also released that communicator.
>>>
>>>
>>> But nevertheless, a cid in the communicator structure is a uint32_t,
>>> so it should not hit the 16k limit there yet. this is not new, so if
>>> there is a discrepancy between what the comm structure assumes that a
>>> cid is and what the pml assumes, than this was in the code since the
>>> very first days of Open MPI...
>>>
>>> Thanks
>>> Edgar
>>>
>>> Brian W. Barrett wrote:
>>>> On Thu, 30 Apr 2009, Ralph Castain wrote:
>>>>> We seem to have hit a problem here - it looks like we are seeing a
>>>>> built-in limit on the number of communicators one can create in a
>>>>> program. The program basically does a loop, calling MPI_Comm_split
>>>>> each
>>>>> time through the loop to create a sub-communicator, does a reduce
>>>>> operation on the members of the sub-communicator, and then calls
>>>>> MPI_Comm_free to release it (this is a minimized reproducer for the
>>>>> real
>>>>> code). After 64k times through the loop, the program fails.
>>>>>
>>>>> This looks remarkably like a 16-bit index that hits a max value and
>>>>> then
>>>>> blocks.
>>>>>
>>>>> I have looked at the communicator code, but I don't immediately see
>>>>> such
>>>>> a field. Is anyone aware of some other place where we would have a
>>>>> limit
>>>>> that would cause this problem?
>>>> There's a maximum of 32768 communicator ids when using OB1 (each PML
>>>> can set the max contextid, although the communicator code is the
>>>> part that actually assigns a cid). Assuming that comm_free is
>>>> actually properly called, there should be plenty of cids available
>>>> for that pattern. However, I'm not sure I understand the block
>>>> algorithm someone added to cid allocation - I'd have to guess that
>>>> there's something funny with that routine and cids aren't being
>>>> recycled properly.
>>>> Brian
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> --
>>> Edgar Gabriel
>>> Assistant Professor
>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>> Department of Computer Science University of Houston
>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335