Dear all,

I am using openmpi 1.6 on linux. I have a question on MPI_Reduce_scatter.

I try to see how large the data can push through MPI_Reduce_scatter using the
following code.

size = (long) 1024*1024*1024*4;
for(k=1;k<=16;++k) {
    bufsize = k*size/16;
    for(i=0;i<nproc;++i)
      recvCount[i] = bufsize/nproc;
    for (i=0;i<bufsize;++i)
      sbuf[i] = myid+1;
    printf("buffer size: %ld recvCount[0]:%d\n",bufsize,recvCount[0]);

    MPI_Reduce_scatter(sbuf,rbuf,recvCount,MPI_LONG,
               MPI_SUM,MPI_COMM_WORLD);
    for(i=0;i<bufsize/nproc;++i) {
      if (rbuf[i] != nproc/2*(nproc+1)) {
    printf("failed in %d",myid);
    break;
      }
    }
   printf("done\n");
  }
 
  ierr = MPI_Finalize();


I used 4 processes and found that if 4 processes are in the same machine. It can
go through size = MAX_INT. However, if 4 processes are in 4 different machines,
it hangs at size=  1073741824.


#0  0x000000337f6d3fc3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x00002b1e9c45d4eb in epoll_dispatch (base=0xd08e940, arg=0xd08e800,
    tv=<value optimized out>) at epoll.c:215
#2  0x00002b1e9c45f98a in opal_event_base_loop (base=0xd08e940,
    flags=<value optimized out>) at event.c:838
#3  0x00002b1e9c485809 in opal_progress () at runtime/opal_progress.c:189
#4  0x00002b1e9c3ccf05 in opal_condition_wait (req_ptr=0x7fffc4519fb0,
    status=0x0) at ../opal/threads/condition.h:99
#5  ompi_request_wait_completion (req_ptr=0x7fffc4519fb0, status=0x0)
    at ../ompi/request/request.h:377
#6  ompi_request_default_wait (req_ptr=0x7fffc4519fb0, status=0x0)
    at request/req_wait.c:38
#7  0x00002b1ea0d60dda in ompi_coll_tuned_reduce_scatter_intra_ring (
    sbuf=0x7fffc4519fb0, rbuf=0x2b1ea1384010, rcounts=0xd458e30,
    dtype=0x601fa0, op=0x601790, comm=0x601390, module=0xd458a10)
    at coll_tuned_reduce_scatter.c:584
#8  0x00002b1ea0b4cd8c in mca_coll_sync_reduce_scatter (sbuf=0x2b26a1385010,
    rbuf=0x2b1ea1384010, rcounts=<value optimized out>,
    dtype=<value optimized out>, op=<value optimized out>, comm=0x601390,
    module=0xd458820) at coll_sync_reduce_scatter.c:46
#9  0x00002b1e9c3e7e51 in PMPI_Reduce_scatter (sendbuf=0x2b26a1385010,
    recvbuf=0x2b1ea1384010, recvcounts=0xd458e30,
    datatype=<value optimized out>, op=0x601790, comm=0x601390)
---Type <return> to continue, or q <return> to quit---
    at preduce_scatter.c:129
#10 0x0000000000400ddb in main (argc=1, argv=0x7fffc451a998)
    at test_reduce_scatter.c:50

Does openmpi 1.6 uses different mechanisms in reduce_scatter when communicate
within a machine and inter-machines?

What is the limit of size of buffer to use reduce_scatter?

Thanks for your attention.

Regards,

William