Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Inherent limit on #communicators?
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2009-04-30 14:28:39


cid's are in fact not recycled in the block algorithm. The problem is
that comm_free is not collective, so you can not make any assumptions
whether other procs have also released that communicator.

But nevertheless, a cid in the communicator structure is a uint32_t, so
it should not hit the 16k limit there yet. this is not new, so if there
is a discrepancy between what the comm structure assumes that a cid is
and what the pml assumes, than this was in the code since the very first
days of Open MPI...

Thanks
Edgar

Brian W. Barrett wrote:
> On Thu, 30 Apr 2009, Ralph Castain wrote:
>
>> We seem to have hit a problem here - it looks like we are seeing a
>> built-in limit on the number of communicators one can create in a
>> program. The program basically does a loop, calling MPI_Comm_split each
>> time through the loop to create a sub-communicator, does a reduce
>> operation on the members of the sub-communicator, and then calls
>> MPI_Comm_free to release it (this is a minimized reproducer for the real
>> code). After 64k times through the loop, the program fails.
>>
>> This looks remarkably like a 16-bit index that hits a max value and then
>> blocks.
>>
>> I have looked at the communicator code, but I don't immediately see such
>> a field. Is anyone aware of some other place where we would have a limit
>> that would cause this problem?
>
> There's a maximum of 32768 communicator ids when using OB1 (each PML can
> set the max contextid, although the communicator code is the part that
> actually assigns a cid). Assuming that comm_free is actually properly
> called, there should be plenty of cids available for that pattern.
> However, I'm not sure I understand the block algorithm someone added to
> cid allocation - I'd have to guess that there's something funny with
> that routine and cids aren't being recycled properly.
>
> Brian
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335