Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Inherent limit on #communicators?
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2009-05-02 14:08:56


ok, r21142 should fix the problem for the app. I did test it with a
number of scenarios (e.g. all intra-comm cases, inter-comm cases,
intercomm_merge etc.), but I would suggest to let at least one night of
MTT runs go over it before we file a CMR for 1.3 ...

Thanks
Edgar

>> On Fri, May 1, 2009 at 6:38 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>> I'm not entirely sure if David is going to be in today, so I
>> will answer for him (and let him correct me later!).
>>
>> This code is indeed representative of what the app is doing.
>> Basically, the user repeatedly splits the communicator so he
>> can run mini test cases before going on to the larger
>> computation. So it is always the base communicator being
>> repeatedly split and freed.
>>
>> I would suspect, therefore, that the quick fix would serve us
>> just fine while the worst case is later resolved.
>>
>> Thanks
>> Ralph
>>
>>
>> On Fri, May 1, 2009 at 6:08 AM, Edgar Gabriel <gabriel_at_[hidden]>
>> wrote:
>> David,
>>
>> is this code representative for what your app is doing?
>> E.g. you have a base communicator (e.g. MPI_COMM_WORLD)
>> which is being 'split', freed again, split, freed again
>> etc. ? i.e. the important aspect is that the same
>> 'base' communicator is being used for deriving new
>> communicators again and again?
>>
>> The reason I ask is two-fold: one, you would in that
>> case be one of the ideal beneficiaries of the block cid
>> algorithm :-) (even if it fails you right now); two, a
>> fix for this scenario which basically tries to reuse
>> the last block used (and which would fix your case if
>> the condition is true) is roughly five lines of code.
>> This would give us the possibility to have a fix
>> quickly in the trunk and v1.3 (keep in mind that the
>> block-cid code is in the trunk since two years and this
>> is the first problem that we have) and give us more
>> time to develop a profound solution for the worst case
>> - a chain of communicators being created, e.g.
>> communicator 1 is basis to derive a new comm 2, comm 2
>> is being used to derive comm 3 etc.
>>
>> Thanks
>> Edgar
>>
>> David Gunter wrote:
>> Here is the test code reproducer:
>>
>> program test2
>> implicit none
>> include 'mpif.h'
>> integer ierr, myid,
>> numprocs,i1,i2,n,local_comm,
>> $ icolor,ikey,rank,root
>>
>> c
>> c... MPI set-up
>> ierr = 0
>> call MPI_INIT(IERR)
>> ierr = 1
>> CALL MPI_COMM_SIZE(MPI_COMM_WORLD,
>> numprocs, ierr)
>> print *, ierr
>>
>> ierr = -1
>>
>> CALL MPI_COMM_RANK(MPI_COMM_WORLD,
>> myid, ierr)
>>
>> ierr = -5
>> i1 = ierr
>> if (myid.eq.0) i1 = 1
>> call mpi_allreduce(i1, i2,
>> 1,MPI_integer,MPI_MIN,
>> $ MPI_COMM_WORLD,ierr)
>>
>> ikey = myid
>> if (mod(myid,2).eq.0) then
>> icolor = 0
>> else
>> icolor = MPI_UNDEFINED
>> end if
>>
>> root = 0
>> do n = 1, 100000
>>
>> call MPI_COMM_SPLIT(MPI_COMM_WORLD,
>> icolor,
>> $ ikey, local_comm, ierr)
>>
>> if (mod(myid,2).eq.0) then
>> CALL MPI_COMM_RANK(local_comm,
>> rank, ierr)
>> i2 = i1
>> call mpi_reduce(i1, i2,
>> 1,MPI_integer,MPI_MIN,
>> $ root, local_comm,ierr)
>>
>> if
>> (myid.eq.0.and.mod(n,10).eq.0)
>> $ print *, n, i1,
>> i2,icolor,ikey
>>
>> call mpi_comm_free(local_comm,
>> ierr)
>> end if
>>
>> end do
>> c if (icolor.eq.0) call
>> mpi_comm_free(local_comm, ierr)
>>
>>
>>
>> call MPI_barrier(MPi_COMM_WORLD,ierr)
>>
>> call MPI_FINALIZE(IERR)
>>
>> print *, myid, ierr
>>
>> end
>>
>>
>>
>> -david
>> --
>> David Gunter
>> HPC-3: Parallel Tools Team
>> Los Alamos National Laboratory
>>
>>
>>
>> On Apr 30, 2009, at 12:43 PM, David Gunter
>> wrote:
>>
>> Just to throw out more info on
>> this, the test code runs fine
>> on previous versions of OMPI.
>> It only hangs on the 1.3 line
>> when the cid reaches 65536.
>>
>> -david
>> --
>> David Gunter
>> HPC-3: Parallel Tools Team
>> Los Alamos National Laboratory
>>
>>
>>
>> On Apr 30, 2009, at 12:28 PM,
>> Edgar Gabriel wrote:
>>
>> cid's are in fact
>> not recycled in the
>> block algorithm.
>> The problem is that
>> comm_free is not
>> collective, so you
>> can not make any
>> assumptions whether
>> other procs have
>> also released that
>> communicator.
>>
>>
>> But nevertheless, a
>> cid in the
>> communicator
>> structure is a
>> uint32_t, so it
>> should not hit the
>> 16k limit there
>> yet. this is not
>> new, so if there is
>> a discrepancy
>> between what the
>> comm structure
>> assumes that a cid
>> is and what the pml
>> assumes, than this
>> was in the code
>> since the very
>> first days of Open
>> MPI...
>>
>> Thanks
>> Edgar
>>
>> Brian W. Barrett
>> wrote:
>> On Thu,
>> 30 Apr
>> 2009,
>> Ralph
>> Castain
>> wrote:
>> We
>> seem
>> to
>> have
>> hit
>> a
>> problem
>> here
>> -
>> it
>> looks
>> like
>> we
>> are
>> seeing
>> a
>> built-in
>> limit
>> on
>> the
>> number
>> of
>> communicators
>> one
>> can
>> create
>> in
>> a
>> program.
>> The
>> program
>> basically
>> does
>> a
>> loop,
>> calling
>> MPI_Comm_split
>> each
>> time
>> through
>> the
>> loop
>> to
>> create
>> a
>> sub-communicator,
>> does
>> a
>> reduce
>> operation
>> on
>> the
>> members
>> of
>> the
>> sub-communicator,
>> and
>> then
>> calls
>> MPI_Comm_free
>> to
>> release
>> it
>> (this
>> is
>> a
>> minimized
>> reproducer
>> for
>> the
>> real
>> code).
>> After
>> 64k
>> times
>> through
>> the
>> loop,
>> the
>> program
>> fails.
>>
>> This
>> looks
>> remarkably
>> like
>> a
>> 16-bit
>> index
>> that
>> hits
>> a
>> max
>> value
>> and
>> then
>> blocks.
>>
>> I
>> have
>> looked
>> at
>> the
>> communicator
>> code,
>> but
>> I
>> don't
>> immediately
>> see
>> such
>> a
>> field.
>> Is
>> anyone
>> aware
>> of
>> some
>> other
>> place
>> where
>> we
>> would
>> have
>> a
>> limit
>> that
>> would
>> cause
>> this
>> problem?
>>
>> There's
>> a
>> maximum
>> of
>> 32768
>> communicator
>> ids
>> when
>> using
>> OB1
>> (each
>> PML can
>> set the
>> max
>> contextid,
>> although
>> the
>> communicator
>> code is
>> the
>> part
>> that
>> actually
>> assigns
>> a cid).
>> Assuming
>> that
>> comm_free
>> is
>> actually
>> properly
>> called,
>> there
>> should
>> be
>> plenty
>> of cids
>> available
>> for
>> that
>> pattern.
>> However,
>> I'm not
>> sure I
>> understand
>> the
>> block
>> algorithm
>> someone
>> added
>> to cid
>> allocation
>> - I'd
>> have to
>> guess
>> that
>> there's
>> something
>> funny
>> with
>> that
>> routine
>> and
>> cids
>> aren't
>> being
>> recycled
>> properly.
>> Brian
>>
>> _______________________________________________
>> devel
>> mailing
>> list
>> devel_at_[hidden]
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Edgar Gabriel
>> Assistant Professor
>> Parallel Software
>> Technologies Lab
>>
>> http://pstl.cs.uh.edu
>> Department of
>> Computer Science
>> University
>> of Houston
>> Philip G. Hoffman
>> Hall, Room 524
>> Houston,
>> TX-77204, USA
>> Tel: +1 (713)
>> 743-3857
>> Fax: +1
>> (713) 743-3335
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Edgar Gabriel
>> Assistant Professor
>> Parallel Software Technologies Lab
>> http://pstl.cs.uh.edu
>> Department of Computer Science University of
>> Houston
>> Philip G. Hoffman Hall, Room 524 Houston,
>> TX-77204, USA
>> Tel: +1 (713) 743-3857 Fax: +1 (713)
>> 743-3335
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>>
>>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335