Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Inherent limit on #communicators?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-05-02 14:28:47


Thanks Edgar!! Much appreciated...

On May 2, 2009, at 12:08 PM, Edgar Gabriel wrote:

> ok, r21142 should fix the problem for the app. I did test it with a
> number of scenarios (e.g. all intra-comm cases, inter-comm cases,
> intercomm_merge etc.), but I would suggest to let at least one night
> of MTT runs go over it before we file a CMR for 1.3 ...
>
> Thanks
> Edgar
>
>
>>> On Fri, May 1, 2009 at 6:38 AM, Ralph Castain <rhc_at_[hidden]>
>>> wrote:
>>> I'm not entirely sure if David is going to be in today, so I
>>> will answer for him (and let him correct me later!).
>>>
>>> This code is indeed representative of what the app is doing.
>>> Basically, the user repeatedly splits the communicator so he
>>> can run mini test cases before going on to the larger
>>> computation. So it is always the base communicator being
>>> repeatedly split and freed.
>>>
>>> I would suspect, therefore, that the quick fix would serve us
>>> just fine while the worst case is later resolved.
>>>
>>> Thanks
>>> Ralph
>>>
>>>
>>> On Fri, May 1, 2009 at 6:08 AM, Edgar Gabriel <gabriel_at_[hidden]>
>>> wrote:
>>> David,
>>>
>>> is this code representative for what your app is doing?
>>> E.g. you have a base communicator (e.g. MPI_COMM_WORLD)
>>> which is being 'split', freed again, split, freed again
>>> etc. ? i.e. the important aspect is that the same
>>> 'base' communicator is being used for deriving new
>>> communicators again and again?
>>>
>>> The reason I ask is two-fold: one, you would in that
>>> case be one of the ideal beneficiaries of the block cid
>>> algorithm :-) (even if it fails you right now); two, a
>>> fix for this scenario which basically tries to reuse
>>> the last block used (and which would fix your case if
>>> the condition is true) is roughly five lines of code.
>>> This would give us the possibility to have a fix
>>> quickly in the trunk and v1.3 (keep in mind that the
>>> block-cid code is in the trunk since two years and this
>>> is the first problem that we have) and give us more
>>> time to develop a profound solution for the worst case
>>> - a chain of communicators being created, e.g.
>>> communicator 1 is basis to derive a new comm 2, comm 2
>>> is being used to derive comm 3 etc.
>>>
>>> Thanks
>>> Edgar
>>>
>>> David Gunter wrote:
>>> Here is the test code reproducer:
>>>
>>> program test2
>>> implicit none
>>> include 'mpif.h'
>>> integer ierr, myid,
>>> numprocs,i1,i2,n,local_comm,
>>> $ icolor,ikey,rank,root
>>>
>>> c
>>> c... MPI set-up
>>> ierr = 0
>>> call MPI_INIT(IERR)
>>> ierr = 1
>>> CALL MPI_COMM_SIZE(MPI_COMM_WORLD,
>>> numprocs, ierr)
>>> print *, ierr
>>>
>>> ierr = -1
>>>
>>> CALL MPI_COMM_RANK(MPI_COMM_WORLD,
>>> myid, ierr)
>>>
>>> ierr = -5
>>> i1 = ierr
>>> if (myid.eq.0) i1 = 1
>>> call mpi_allreduce(i1, i2,
>>> 1,MPI_integer,MPI_MIN,
>>> $ MPI_COMM_WORLD,ierr)
>>>
>>> ikey = myid
>>> if (mod(myid,2).eq.0) then
>>> icolor = 0
>>> else
>>> icolor = MPI_UNDEFINED
>>> end if
>>>
>>> root = 0
>>> do n = 1, 100000
>>>
>>> call MPI_COMM_SPLIT(MPI_COMM_WORLD,
>>> icolor,
>>> $ ikey, local_comm, ierr)
>>>
>>> if (mod(myid,2).eq.0) then
>>> CALL MPI_COMM_RANK(local_comm,
>>> rank, ierr)
>>> i2 = i1
>>> call mpi_reduce(i1, i2,
>>> 1,MPI_integer,MPI_MIN,
>>> $ root, local_comm,ierr)
>>>
>>> if
>>> (myid.eq.0.and.mod(n,10).eq.0)
>>> $ print *, n, i1,
>>> i2,icolor,ikey
>>>
>>> call mpi_comm_free(local_comm,
>>> ierr)
>>> end if
>>>
>>> end do
>>> c if (icolor.eq.0) call
>>> mpi_comm_free(local_comm, ierr)
>>>
>>>
>>>
>>> call MPI_barrier(MPi_COMM_WORLD,ierr)
>>>
>>> call MPI_FINALIZE(IERR)
>>>
>>> print *, myid, ierr
>>>
>>> end
>>>
>>>
>>>
>>> -david
>>> --
>>> David Gunter
>>> HPC-3: Parallel Tools Team
>>> Los Alamos National Laboratory
>>>
>>>
>>>
>>> On Apr 30, 2009, at 12:43 PM, David Gunter
>>> wrote:
>>>
>>> Just to throw out more info on
>>> this, the test code runs fine
>>> on previous versions of OMPI.
>>> It only hangs on the 1.3 line
>>> when the cid reaches 65536.
>>>
>>> -david
>>> --
>>> David Gunter
>>> HPC-3: Parallel Tools Team
>>> Los Alamos National Laboratory
>>>
>>>
>>>
>>> On Apr 30, 2009, at 12:28 PM,
>>> Edgar Gabriel wrote:
>>>
>>> cid's are in fact
>>> not recycled in the
>>> block algorithm.
>>> The problem is that
>>> comm_free is not
>>> collective, so you
>>> can not make any
>>> assumptions whether
>>> other procs have
>>> also released that
>>> communicator.
>>>
>>>
>>> But nevertheless, a
>>> cid in the
>>> communicator
>>> structure is a
>>> uint32_t, so it
>>> should not hit the
>>> 16k limit there
>>> yet. this is not
>>> new, so if there is
>>> a discrepancy
>>> between what the
>>> comm structure
>>> assumes that a cid
>>> is and what the pml
>>> assumes, than this
>>> was in the code
>>> since the very
>>> first days of Open
>>> MPI...
>>>
>>> Thanks
>>> Edgar
>>>
>>> Brian W. Barrett
>>> wrote:
>>> On Thu,
>>> 30 Apr
>>> 2009,
>>> Ralph
>>> Castain
>>> wrote:
>>> We
>>> seem
>>> to
>>> have
>>> hit
>>> a
>>> problem
>>> here
>>> -
>>> it
>>> looks
>>> like
>>> we
>>> are
>>> seeing
>>> a
>>> built-in
>>> limit
>>> on
>>> the
>>> number
>>> of
>>> communicators
>>> one
>>> can
>>> create
>>> in
>>> a
>>> program.
>>> The
>>> program
>>> basically
>>> does
>>> a
>>> loop,
>>> calling
>>> MPI_Comm_split
>>> each
>>> time
>>> through
>>> the
>>> loop
>>> to
>>> create
>>> a
>>> sub-communicator,
>>> does
>>> a
>>> reduce
>>> operation
>>> on
>>> the
>>> members
>>> of
>>> the
>>> sub-communicator,
>>> and
>>> then
>>> calls
>>> MPI_Comm_free
>>> to
>>> release
>>> it
>>> (this
>>> is
>>> a
>>> minimized
>>> reproducer
>>> for
>>> the
>>> real
>>> code).
>>> After
>>> 64k
>>> times
>>> through
>>> the
>>> loop,
>>> the
>>> program
>>> fails.
>>>
>>> This
>>> looks
>>> remarkably
>>> like
>>> a
>>> 16-bit
>>> index
>>> that
>>> hits
>>> a
>>> max
>>> value
>>> and
>>> then
>>> blocks.
>>>
>>> I
>>> have
>>> looked
>>> at
>>> the
>>> communicator
>>> code,
>>> but
>>> I
>>> don't
>>> immediately
>>> see
>>> such
>>> a
>>> field.
>>> Is
>>> anyone
>>> aware
>>> of
>>> some
>>> other
>>> place
>>> where
>>> we
>>> would
>>> have
>>> a
>>> limit
>>> that
>>> would
>>> cause
>>> this
>>> problem?
>>>
>>> There's
>>> a
>>> maximum
>>> of
>>> 32768
>>> communicator
>>> ids
>>> when
>>> using
>>> OB1
>>> (each
>>> PML can
>>> set the
>>> max
>>> contextid,
>>> although
>>> the
>>> communicator
>>> code is
>>> the
>>> part
>>> that
>>> actually
>>> assigns
>>> a cid).
>>> Assuming
>>> that
>>> comm_free
>>> is
>>> actually
>>> properly
>>> called,
>>> there
>>> should
>>> be
>>> plenty
>>> of cids
>>> available
>>> for
>>> that
>>> pattern.
>>> However,
>>> I'm not
>>> sure I
>>> understand
>>> the
>>> block
>>> algorithm
>>> someone
>>> added
>>> to cid
>>> allocation
>>> - I'd
>>> have to
>>> guess
>>> that
>>> there's
>>> something
>>> funny
>>> with
>>> that
>>> routine
>>> and
>>> cids
>>> aren't
>>> being
>>> recycled
>>> properly.
>>> Brian
>>>
>>> _______________________________________________
>>> devel
>>> mailing
>>> list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> --
>>> Edgar Gabriel
>>> Assistant Professor
>>> Parallel Software
>>> Technologies
>>> Lab http://pstl.cs.uh.edu
>>> Department of
>>> Computer
>>> Science University
>>> of Houston
>>> Philip G. Hoffman
>>> Hall, Room
>>> 524 Houston,
>>> TX-77204, USA
>>> Tel: +1 (713)
>>>
>>> 743-3857 Fax: +1
>>> (713) 743-3335
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> --
>>> Edgar Gabriel
>>> Assistant Professor
>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>> Department of Computer Science University of
>>> Houston
>>> Philip G. Hoffman Hall, Room 524 Houston,
>>> TX-77204, USA
>>> Tel: +1 (713) 743-3857 Fax: +1 (713)
>>> 743-3335
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>>
>>>
>> ------------------------------------------------------------------------
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> Edgar Gabriel
> Assistant Professor
> Parallel Software Technologies Lab http://pstl.cs.uh.edu
> Department of Computer Science University of Houston
> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel