Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Inherent limit on #communicators?
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2009-05-01 11:44:26


Ugh - I'll fix today.

Brian

On Fri, 1 May 2009, Ralph Castain wrote:

> BTW: when compiling Brian's change, I got a warning about comparing
> signed and unsigned. Sure enough, I found that the communicator id is
> defined as an unsigned int, while the PML is treating it as a *signed*
> int.
>
> We need to get this corrected - which way do you want it to be?
>
> I will add this requirement to the ticket...
>
> Thanks
> Ralph
>
>
> On Fri, May 1, 2009 at 6:38 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> I'm not entirely sure if David is going to be in today, so I
> will answer for him (and let him correct me later!).
>
> This code is indeed representative of what the app is doing.
> Basically, the user repeatedly splits the communicator so he
> can run mini test cases before going on to the larger
> computation. So it is always the base communicator being
> repeatedly split and freed.
>
> I would suspect, therefore, that the quick fix would serve us
> just fine while the worst case is later resolved.
>
> Thanks
> Ralph
>
>
> On Fri, May 1, 2009 at 6:08 AM, Edgar Gabriel <gabriel_at_[hidden]>
> wrote:
> David,
>
> is this code representative for what your app is doing?
> E.g. you have a base communicator (e.g. MPI_COMM_WORLD)
> which is being 'split', freed again, split, freed again
> etc. ? i.e. the important aspect is that the same
> 'base' communicator is being used for deriving new
> communicators again and again?
>
> The reason I ask is two-fold: one, you would in that
> case be one of the ideal beneficiaries of the block cid
> algorithm :-) (even if it fails you right now);  two, a
> fix for this scenario which basically tries to reuse
> the last block used (and which would fix your case if
> the condition is true) is roughly five lines of code.
> This would give us the possibility to have a fix
> quickly in the trunk and v1.3 (keep in mind that the
> block-cid code is in the trunk since two years and this
> is the first problem that we have) and give us more
> time to develop a profound solution for the worst case
> - a chain of communicators being created, e.g.
> communicator 1 is basis to derive a new comm 2, comm 2
> is being used to derive comm 3 etc.
>
> Thanks
> Edgar
>
> David Gunter wrote:
> Here is the test code reproducer:
>
>      program test2
>      implicit none
>      include 'mpif.h'
>      integer ierr, myid,
> numprocs,i1,i2,n,local_comm,
>     $     icolor,ikey,rank,root
>
> c
> c...  MPI set-up
>      ierr = 0
>      call MPI_INIT(IERR)
>      ierr = 1
>      CALL MPI_COMM_SIZE(MPI_COMM_WORLD,
> numprocs, ierr)
>      print *, ierr
>
>      ierr = -1
>
>      CALL MPI_COMM_RANK(MPI_COMM_WORLD,
> myid, ierr)
>
>      ierr = -5
>      i1 = ierr
>      if (myid.eq.0) i1 = 1
>      call mpi_allreduce(i1, i2,
> 1,MPI_integer,MPI_MIN,
>     $     MPI_COMM_WORLD,ierr)
>
>      ikey = myid
>      if (mod(myid,2).eq.0) then
>         icolor = 0
>      else
>         icolor = MPI_UNDEFINED
>      end if
>
>      root = 0
>      do n = 1, 100000
>
>         call MPI_COMM_SPLIT(MPI_COMM_WORLD,
> icolor,
>     $        ikey, local_comm, ierr)
>
>         if (mod(myid,2).eq.0) then
>            CALL MPI_COMM_RANK(local_comm,
> rank, ierr)
>            i2 = i1
>            call mpi_reduce(i1, i2,
> 1,MPI_integer,MPI_MIN,
>     $           root, local_comm,ierr)
>
>            if
> (myid.eq.0.and.mod(n,10).eq.0)
>     $           print *, n, i1,
> i2,icolor,ikey
>
>            call mpi_comm_free(local_comm,
> ierr)
>         end if
>
>      end do
> c      if (icolor.eq.0) call
> mpi_comm_free(local_comm, ierr)
>
>
>
>      call MPI_barrier(MPi_COMM_WORLD,ierr)
>
>      call MPI_FINALIZE(IERR)
>
>      print *, myid, ierr
>
>      end
>
>
>
> -david
> --
> David Gunter
> HPC-3: Parallel Tools Team
> Los Alamos National Laboratory
>
>
>
> On Apr 30, 2009, at 12:43 PM, David Gunter
> wrote:
>
> Just to throw out more info on
> this, the test code runs fine
> on previous versions of OMPI.
>  It only hangs on the 1.3 line
> when the cid reaches 65536.
>
> -david
> --
> David Gunter
> HPC-3: Parallel Tools Team
> Los Alamos National Laboratory
>
>
>
> On Apr 30, 2009, at 12:28 PM,
> Edgar Gabriel wrote:
>
> cid's are in fact
> not recycled in the
> block algorithm.
> The problem is that
> comm_free is not
> collective, so you
> can not make any
> assumptions whether
> other procs have
> also released that
> communicator.
>
>
> But nevertheless, a
> cid in the
> communicator
> structure is a
> uint32_t, so it
> should not hit the
> 16k limit there
> yet. this is not
> new, so if there is
> a discrepancy
> between what the
> comm structure
> assumes that a cid
> is and what the pml
> assumes, than this
> was in the code
> since the very
> first days of Open
> MPI...
>
> Thanks
> Edgar
>
> Brian W. Barrett
> wrote:
> On Thu,
> 30 Apr
> 2009,
> Ralph
> Castain
> wrote:
> We
> seem
> to
> have
> hit
> a
> problem
> here
> -
> it
> looks
> like
> we
> are
> seeing
> a
> built-in
> limit
> on
> the
> number
> of
> communicators
> one
> can
> create
> in
> a
> program.
> The
> program
> basically
> does
> a
> loop,
> calling
> MPI_Comm_split
> each
> time
> through
> the
> loop
> to
> create
> a
> sub-communicator,
> does
> a
> reduce
> operation
> on
> the
> members
> of
> the
> sub-communicator,
> and
> then
> calls
> MPI_Comm_free
> to
> release
> it
> (this
> is
> a
> minimized
> reproducer
> for
> the
> real
> code).
> After
> 64k
> times
> through
> the
> loop,
> the
> program
> fails.
>
> This
> looks
> remarkably
> like
> a
> 16-bit
> index
> that
> hits
> a
> max
> value
> and
> then
> blocks.
>
> I
> have
> looked
> at
> the
> communicator
> code,
> but
> I
> don't
> immediately
> see
> such
> a
> field.
> Is
> anyone
> aware
> of
> some
> other
> place
> where
> we
> would
> have
> a
> limit
> that
> would
> cause
> this
> problem?
>
> There's
> a
> maximum
> of
> 32768
> communicator
> ids
> when
> using
> OB1
> (each
> PML can
> set the
> max
> contextid,
> although
> the
> communicator
> code is
> the
> part
> that
> actually
> assigns
> a cid).
>  Assuming
> that
> comm_free
> is
> actually
> properly
> called,
> there
> should
> be
> plenty
> of cids
> available
> for
> that
> pattern.
> However,
> I'm not
> sure I
> understand
> the
> block
> algorithm
> someone
> added
> to cid
> allocation
> - I'd
> have to
> guess
> that
> there's
> something
> funny
> with
> that
> routine
> and
> cids
> aren't
> being
> recycled
> properly.
> Brian
> _______________________________________________
> devel
> mailing
> list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Edgar Gabriel
> Assistant Professor
> Parallel Software
> Technologies Lab  
>  
>  http://pstl.cs.uh.edu
> Department of
> Computer Science  
>        University
> of Houston
> Philip G. Hoffman
> Hall, Room 524    
>    Houston,
> TX-77204, USA
> Tel: +1 (713)
> 743-3857          
>        Fax: +1
> (713) 743-3335
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Edgar Gabriel
> Assistant Professor
> Parallel Software Technologies Lab    
>  http://pstl.cs.uh.edu
> Department of Computer Science          University of
> Houston
> Philip G. Hoffman Hall, Room 524        Houston,
> TX-77204, USA
> Tel: +1 (713) 743-3857                  Fax: +1 (713)
> 743-3335
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
>
>