Brian,

I notice in the OMPI_INFO output the following parameters that seem relevant to this problem:

                 MCA btl: parameter "btl_self_free_list_num" (current value: "0")
                 MCA btl: parameter "btl_self_free_list_max" (current value: "-1")
                 MCA btl: parameter "btl_self_free_list_inc" (current value: "32")
                 MCA btl: parameter "btl_self_eager_limit" (current value: "131072")
                 MCA btl: parameter "btl_self_max_send_size" (current value: "262144")
                 MCA btl: parameter "btl_self_max_rdma_size" (current value: "2147483647")
                 MCA btl: parameter "btl_self_exclusivity" (current value: "65536")
                 MCA btl: parameter "btl_self_flags" (current value: "2")
                 MCA btl: parameter "btl_self_priority" (current value: "0")

Specifically the 'self_max_send_size=262144', which I assume is the maximum size (bytes?) message a processor can send to itself.  None of the messages in my above tests approached this limit.  However, I am puzzled by this, because the program below runs correctly for ridiculously large message sizes (as shown 200 Mbytes).

     program test
!
      implicit none
!
      include 'mpif.h'
!
      integer imjm
      parameter (imjm=200000000)
!
      integer  itype , istrt , itau ,istat , msg , nstat
      integer irank,nproc,iwin,i,n,ir
      integer isizereal,iwinsize,itarget_disp
!
      integer , dimension(:) , allocatable :: len
      integer , dimension(:) , allocatable :: loff
      real , dimension(:) , allocatable :: x  
      real , dimension(:) , allocatable :: ximjm  
!
!
      call mpi_init(istat)
      call mpi_comm_rank(mpi_comm_world,irank,istat)
      call mpi_comm_size(mpi_comm_world,nproc,istat)
!
      allocate(len(nproc))
      allocate(loff(nproc))
      allocate(x(imjm/nproc))
!
      ir = irank + 1
!
      if(ir.eq.1)allocate(ximjm(imjm))
!
      do 200 n = 1,nproc
        len(n) = imjm/nproc
        loff(n) = (n-1)*imjm/nproc
  200 continue
!
      call mpi_type_size(mpi_real,isizereal,istat)
!
      iwinsize = imjm*isizereal
      call mpi_win_create(ximjm,iwinsize,isizereal,mpi_info_null,
     &                    mpi_comm_world,iwin,istat)
!
      if(ir.eq.1)then
        do 250 i = 1,imjm
          ximjm(i) = i
  250   continue
      endif
!
      itarget_disp = loff(ir)
      call mpi_win_fence(0,iwin,istat)
      call mpi_get(x,len(ir),mpi_real,0,itarget_disp,len(ir),mpi_real,
     &             iwin,istat)
      call mpi_win_fence(0,iwin,istat)
!
      print('(A,i3,8f20.2)'),' x ',ir,x(1),x(len(ir))
      stop
      end

Tom Rosmond




Brian Barrett wrote:
On Mon, 2006-09-04 at 11:01 -0700, Tom Rosmond wrote:

  
Attached is some error output from my tests of 1-sided message
passing, plus my info file.  Below are two copies of a simple fortran
subroutine that mimics mpi_allgatherv using  mpi-get calls.  The top
version fails, the bottom runs OK.  It seems clear from these
examples, plus the 'self_send' phrases in the error output, that there
is a problem internally with a processor sending data to itself.  I
know that your 'mpi_get' implementation is simply a wrapper around
'send/recv' calls, so clearly this shouldn't happen.  However, the
problem does not happen in all cases; I tried to duplicate it in a
simple stand-alone program with mpi_get calls and was unable to make
it fail.  Go figure.
    

That is an odd failure and at first glance it does look like there is
something wrong with our one-sided implementation.  I've filed a bug in
our tracker about the issue and you should get updates on the ticket as
we work on the issue.

Thanks,

Brian

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users