Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] RE : users Digest, Vol 841, Issue 3
From: Audet, Martin (Martin.Audet_at_[hidden])
Date: 2008-03-13 17:04:25


Hi Georges,

Thanks for your patch, but I'm not sure I got it correctly. The patch I got modify a few arguments passed to isend()/irecv()/recv() in coll_basic_allgather.c. Here is the patch I applied:

Index: ompi/mca/coll/basic/coll_basic_allgather.c
===================================================================
--- ompi/mca/coll/basic/coll_basic_allgather.c (revision 17814)
+++ ompi/mca/coll/basic/coll_basic_allgather.c (working copy)
@@ -149,7 +149,7 @@
         }

         /* Do a send-recv between the two root procs. to avoid deadlock */
- err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0,
+ err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root,
                                  MCA_COLL_BASE_TAG_ALLGATHER,
                                  MCA_PML_BASE_SEND_STANDARD,
                                  comm, &reqs[rsize]));
@@ -157,7 +157,7 @@
             return err;
         }

- err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
+ err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root,
                                  MCA_COLL_BASE_TAG_ALLGATHER, comm,
                                  &reqs[0]));
         if (OMPI_SUCCESS != err) {
@@ -186,14 +186,14 @@
             return err;
         }

- err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
+ err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root,
                                  MCA_COLL_BASE_TAG_ALLGATHER,
                                  MCA_PML_BASE_SEND_STANDARD, comm, &req));
         if (OMPI_SUCCESS != err) {
             goto exit;
         }

- err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0,
+ err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root,
                                 MCA_COLL_BASE_TAG_ALLGATHER, comm,
                                 MPI_STATUS_IGNORE));
         if (OMPI_SUCCESS != err) {

However with this patch, I still have the problem. Suppose I start the server with three process and the client with two, the clients prints:

[audet_at_linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./aclient '0.2.0:2000'
intercomm_flag = 1
intercomm_remote_size = 3
rem_rank_tbl[3] = { 0 1 2}
[linux15:26114] *** An error occurred in MPI_Allgather
[linux15:26114] *** on communicator
[linux15:26114] *** MPI_ERR_TRUNCATE: message truncated
[linux15:26114] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 0 with PID 26113 on node linux15 exited on signal 15 (Terminated).
[audet_at_linux15 dyn_connect]$

and abort. The server on the other side simply hang (as before).

Regards,

Martin