Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Inherent limit on #communicators?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-05-01 09:42:58


BTW: when compiling Brian's change, I got a warning about comparing signed
and unsigned. Sure enough, I found that the communicator id is defined as an
unsigned int, while the PML is treating it as a *signed* int.

We need to get this corrected - which way do you want it to be?

I will add this requirement to the ticket...

Thanks
Ralph

On Fri, May 1, 2009 at 6:38 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> I'm not entirely sure if David is going to be in today, so I will answer
> for him (and let him correct me later!).
>
> This code is indeed representative of what the app is doing. Basically, the
> user repeatedly splits the communicator so he can run mini test cases before
> going on to the larger computation. So it is always the base communicator
> being repeatedly split and freed.
>
> I would suspect, therefore, that the quick fix would serve us just fine
> while the worst case is later resolved.
>
> Thanks
> Ralph
>
>
> On Fri, May 1, 2009 at 6:08 AM, Edgar Gabriel <gabriel_at_[hidden]> wrote:
>
>> David,
>>
>> is this code representative for what your app is doing? E.g. you have a
>> base communicator (e.g. MPI_COMM_WORLD) which is being 'split', freed again,
>> split, freed again etc. ? i.e. the important aspect is that the same 'base'
>> communicator is being used for deriving new communicators again and again?
>>
>> The reason I ask is two-fold: one, you would in that case be one of the
>> ideal beneficiaries of the block cid algorithm :-) (even if it fails you
>> right now); two, a fix for this scenario which basically tries to reuse the
>> last block used (and which would fix your case if the condition is true) is
>> roughly five lines of code. This would give us the possibility to have a fix
>> quickly in the trunk and v1.3 (keep in mind that the block-cid code is in
>> the trunk since two years and this is the first problem that we have) and
>> give us more time to develop a profound solution for the worst case - a
>> chain of communicators being created, e.g. communicator 1 is basis to derive
>> a new comm 2, comm 2 is being used to derive comm 3 etc.
>>
>> Thanks
>> Edgar
>>
>> David Gunter wrote:
>>
>>> Here is the test code reproducer:
>>>
>>> program test2
>>> implicit none
>>> include 'mpif.h'
>>> integer ierr, myid, numprocs,i1,i2,n,local_comm,
>>> $ icolor,ikey,rank,root
>>>
>>> c
>>> c... MPI set-up
>>> ierr = 0
>>> call MPI_INIT(IERR)
>>> ierr = 1
>>> CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
>>> print *, ierr
>>>
>>> ierr = -1
>>>
>>> CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
>>>
>>> ierr = -5
>>> i1 = ierr
>>> if (myid.eq.0) i1 = 1
>>> call mpi_allreduce(i1, i2, 1,MPI_integer,MPI_MIN,
>>> $ MPI_COMM_WORLD,ierr)
>>>
>>> ikey = myid
>>> if (mod(myid,2).eq.0) then
>>> icolor = 0
>>> else
>>> icolor = MPI_UNDEFINED
>>> end if
>>>
>>> root = 0
>>> do n = 1, 100000
>>>
>>> call MPI_COMM_SPLIT(MPI_COMM_WORLD, icolor,
>>> $ ikey, local_comm, ierr)
>>>
>>> if (mod(myid,2).eq.0) then
>>> CALL MPI_COMM_RANK(local_comm, rank, ierr)
>>> i2 = i1
>>> call mpi_reduce(i1, i2, 1,MPI_integer,MPI_MIN,
>>> $ root, local_comm,ierr)
>>>
>>> if (myid.eq.0.and.mod(n,10).eq.0)
>>> $ print *, n, i1, i2,icolor,ikey
>>>
>>> call mpi_comm_free(local_comm, ierr)
>>> end if
>>>
>>> end do
>>> c if (icolor.eq.0) call mpi_comm_free(local_comm, ierr)
>>>
>>>
>>>
>>> call MPI_barrier(MPi_COMM_WORLD,ierr)
>>>
>>> call MPI_FINALIZE(IERR)
>>>
>>> print *, myid, ierr
>>>
>>> end
>>>
>>>
>>>
>>> -david
>>> --
>>> David Gunter
>>> HPC-3: Parallel Tools Team
>>> Los Alamos National Laboratory
>>>
>>>
>>>
>>> On Apr 30, 2009, at 12:43 PM, David Gunter wrote:
>>>
>>> Just to throw out more info on this, the test code runs fine on previous
>>>> versions of OMPI. It only hangs on the 1.3 line when the cid reaches 65536.
>>>>
>>>> -david
>>>> --
>>>> David Gunter
>>>> HPC-3: Parallel Tools Team
>>>> Los Alamos National Laboratory
>>>>
>>>>
>>>>
>>>> On Apr 30, 2009, at 12:28 PM, Edgar Gabriel wrote:
>>>>
>>>> cid's are in fact not recycled in the block algorithm. The problem is
>>>>> that comm_free is not collective, so you can not make any assumptions
>>>>> whether other procs have also released that communicator.
>>>>>
>>>>>
>>>>> But nevertheless, a cid in the communicator structure is a uint32_t, so
>>>>> it should not hit the 16k limit there yet. this is not new, so if there is a
>>>>> discrepancy between what the comm structure assumes that a cid is and what
>>>>> the pml assumes, than this was in the code since the very first days of Open
>>>>> MPI...
>>>>>
>>>>> Thanks
>>>>> Edgar
>>>>>
>>>>> Brian W. Barrett wrote:
>>>>>
>>>>>> On Thu, 30 Apr 2009, Ralph Castain wrote:
>>>>>>
>>>>>>> We seem to have hit a problem here - it looks like we are seeing a
>>>>>>> built-in limit on the number of communicators one can create in a
>>>>>>> program. The program basically does a loop, calling MPI_Comm_split
>>>>>>> each
>>>>>>> time through the loop to create a sub-communicator, does a reduce
>>>>>>> operation on the members of the sub-communicator, and then calls
>>>>>>> MPI_Comm_free to release it (this is a minimized reproducer for the
>>>>>>> real
>>>>>>> code). After 64k times through the loop, the program fails.
>>>>>>>
>>>>>>> This looks remarkably like a 16-bit index that hits a max value and
>>>>>>> then
>>>>>>> blocks.
>>>>>>>
>>>>>>> I have looked at the communicator code, but I don't immediately see
>>>>>>> such
>>>>>>> a field. Is anyone aware of some other place where we would have a
>>>>>>> limit
>>>>>>> that would cause this problem?
>>>>>>>
>>>>>> There's a maximum of 32768 communicator ids when using OB1 (each PML
>>>>>> can set the max contextid, although the communicator code is the part that
>>>>>> actually assigns a cid). Assuming that comm_free is actually properly
>>>>>> called, there should be plenty of cids available for that pattern. However,
>>>>>> I'm not sure I understand the block algorithm someone added to cid
>>>>>> allocation - I'd have to guess that there's something funny with that
>>>>>> routine and cids aren't being recycled properly.
>>>>>> Brian
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>
>>>>> --
>>>>> Edgar Gabriel
>>>>> Assistant Professor
>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>>> Department of Computer Science University of Houston
>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> --
>> Edgar Gabriel
>> Assistant Professor
>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>> Department of Computer Science University of Houston
>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>