Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Inherent limit on #communicators?
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2009-04-30 14:20:18

On Thu, 30 Apr 2009, Ralph Castain wrote:

> We seem to have hit a problem here - it looks like we are seeing a
> built-in limit on the number of communicators one can create in a
> program. The program basically does a loop, calling MPI_Comm_split each
> time through the loop to create a sub-communicator, does a reduce
> operation on the members of the sub-communicator, and then calls
> MPI_Comm_free to release it (this is a minimized reproducer for the real
> code). After 64k times through the loop, the program fails.
> This looks remarkably like a 16-bit index that hits a max value and then
> blocks.
> I have looked at the communicator code, but I don't immediately see such
> a field. Is anyone aware of some other place where we would have a limit
> that would cause this problem?

There's a maximum of 32768 communicator ids when using OB1 (each PML can
set the max contextid, although the communicator code is the part that
actually assigns a cid). Assuming that comm_free is actually properly
called, there should be plenty of cids available for that pattern.
However, I'm not sure I understand the block algorithm someone added to
cid allocation - I'd have to guess that there's something funny with that
routine and cids aren't being recycled properly.