Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2008-03-17 15:45:44


Martin,

I found the problem in the inter-allgather, and fixed it in patch 17849.
The same test using however MPI_Intercomm_create (just to simplify my
life compared to Connect/Accept) using 2 vs 4 processes in the two
groups passes for me -- and did fail with the previous version.

Thanks
Edgar

Audet, Martin wrote:
> Hi Jeff,
>
> As I said in my last message (see bellow) the patch (or at least the patch I got) don't fixes the problem for me. Whether I apply it over OpenMPI 1.2.5 or 1.2.6rc2, I still get the same problem:
>
> The client aborts with a truncation error message while the server freeze when for example the server is started on 3 process and the client on 2 process.
>
> Feel free to try yourself the two small client and server programs I posted in my first message.
>
> Thanks,
>
> Martin
>
>
> Subject: [OMPI users] RE : users Digest, Vol 841, Issue 3
> From: Audet, Martin (Martin.Audet_at_[hidden])
> Date: 2008-03-13 17:04:25
>
> Hi Georges,
>
> Thanks for your patch, but I'm not sure I got it correctly. The patch I got modify a few arguments passed to isend()/irecv()/recv() in coll_basic_allgather.c. Here is the patch I applied:
>
> Index: ompi/mca/coll/basic/coll_basic_allgather.c
> ===================================================================
> --- ompi/mca/coll/basic/coll_basic_allgather.c (revision 17814)
> +++ ompi/mca/coll/basic/coll_basic_allgather.c (working copy)
> @@ -149,7 +149,7 @@
> }
>
> /* Do a send-recv between the two root procs. to avoid deadlock */
> - err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0,
> + err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root,
> MCA_COLL_BASE_TAG_ALLGATHER,
> MCA_PML_BASE_SEND_STANDARD,
> comm, &reqs[rsize]));
> @@ -157,7 +157,7 @@
> return err;
> }
>
> - err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
> + err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root,
> MCA_COLL_BASE_TAG_ALLGATHER, comm,
> &reqs[0]));
> if (OMPI_SUCCESS != err) {
> @@ -186,14 +186,14 @@
> return err;
> }
>
> - err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
> + err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root,
> MCA_COLL_BASE_TAG_ALLGATHER,
> MCA_PML_BASE_SEND_STANDARD, comm, &req));
> if (OMPI_SUCCESS != err) {
> goto exit;
> }
>
> - err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0,
> + err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root,
> MCA_COLL_BASE_TAG_ALLGATHER, comm,
> MPI_STATUS_IGNORE));
> if (OMPI_SUCCESS != err) {
>
> However with this patch, I still have the problem. Suppose I start the server with three process and the client with two, the clients prints:
>
> [audet_at_linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./aclient '0.2.0:2000'
> intercomm_flag = 1
> intercomm_remote_size = 3
> rem_rank_tbl[3] = { 0 1 2}
> [linux15:26114] *** An error occurred in MPI_Allgather
> [linux15:26114] *** on communicator
> [linux15:26114] *** MPI_ERR_TRUNCATE: message truncated
> [linux15:26114] *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpiexec noticed that job rank 0 with PID 26113 on node linux15 exited on signal 15 (Terminated).
> [audet_at_linux15 dyn_connect]$
>
> and abort. The server on the other side simply hang (as before).
>
> Regards,
>
> Martin
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Jeff Squyres
> Sent: March 14, 2008 19:45
> To: Open MPI Users
> Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails
>
> Yes, please let us know if this fixes it. We're working on a 1.2.6
> release; we can definitely put this fix in there if it's correct.
>
> Thanks!
>
>
> On Mar 13, 2008, at 4:07 PM, George Bosilca wrote:
>
>> I dig into the sources and I think you correctly pinpoint the bug.
>> It seems we have a mismatch between the local and remote sizes in
>> the inter-communicator allgather in the 1.2 series (which explain
>> the message truncation error when the local and remote groups have a
>> different number of processes). Attached to this email you can find
>> a patch that [hopefully] solve this problem. If you can please test
>> it and let me know if this solve your problem.
>>
>> Thanks,
>> george.
>>
>> <inter_allgather.patch>
>>
>>
>> On Mar 13, 2008, at 1:11 PM, Audet, Martin wrote:
>>
>>> Hi,
>>>
>>> After re-checking the MPI standard (www.mpi-forum.org and MPI - The
>>> Complete Reference), I'm more and more convinced that my small
>>> examples programs establishing a intercommunicator with
>>> MPI_Comm_Connect()/MPI_Comm_accept() over an MPI port and
>>> exchanging data over it with MPI_Allgather() is correct. Especially
>>> calling MPI_Allgather() with recvcount=1 (its third argument)
>>> instead of the total number of MPI_INT that will be received (e.g.
>>> intercomm_remote_size in the examples) is both correct and
>>> consistent with MPI_Allgather() behavior on intracommunicator (e.g.
>>> "normal" communicator).
>>>
>>> MPI_Allgather(&comm_rank, 1, MPI_INT,
>>> rem_rank_tbl, 1, MPI_INT,
>>> intercomm);
>>>
>>> Also the recvbuf argument (the second argument) of MPI_Allgather()
>>> in the examples should have a size of intercomm_remote_size (e.g.
>>> the size of the remote group), not the sum of the local and remote
>>> groups in the client and sever process. The standard says that for
>>> all-to-all type of operations over an intercommunicator, the
>>> process send and receives data from the remote group only (anyway
>>> it is not possible to exchange data with process of the local group
>>> over an intercommunicator).
>>>
>>> So, for me there is no reason for stopping the process with an
>>> error message complaining about message truncation. There should be
>>> no truncation, sendcount, sendtype, recvcount and recvtype
>>> arguments of MPI_Allgather() are correct and consistent.
>>>
>>> So again for me the OpenMPI behavior with my example look more and
>>> more like a bug...
>>>
>>> Concerning George comment about valgrind and TCP/IP, I totally
>>> agree, messages reported by valgrind are only a clue of a bug,
>>> especially in this contex, not a proof of bug. Another clue is that
>>> my small examples work perfectly with mpich2 ch3:sock.
>>>
>>> Regards,
>>>
>>> Martin Audet
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 4
>>> Date: Thu, 13 Mar 2008 08:21:51 +0100
>>> From: jody <jody.xha_at_[hidden]>
>>> Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails
>>> To: "Open MPI Users" <users_at_[hidden]>
>>> Message-ID:
>>> <9b0da5ce0803130021l4ead0f91qaf43e4ac7d332c93_at_[hidden]>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> HI
>>> I think the recvcount argument you pass to MPI_Allgather should not
>>> be
>>> 1 but instead
>>> the number of MPI_INTs your buffer rem_rank_tbl can contain.
>>> As it stands now, you tell MPI_Allgather that it may only receive 1
>>> MPI_INT.
>>>
>>> Furthermore, i'm not sure, but i think your receive buffer should be
>>> large enough
>>> to contain messages from *all* processes, and not just from the
>>> "far side"
>>>
>>> Jody
>>>
>>> .
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 6
>>> Date: Thu, 13 Mar 2008 09:06:47 -0500
>>> From: George Bosilca <bosilca_at_[hidden]>
>>> Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails
>>> To: Open MPI Users <users_at_[hidden]>
>>> Message-ID: <82E9FF28-FB87-4FFB-A492-DDE472D5DEA7_at_[hidden]>
>>> Content-Type: text/plain; charset="us-ascii"
>>>
>>> I am not aware of any problems with the allreduce/allgather. But, we
>>> are aware of the problem with valgrind that report non initialized
>>> values when used with TCP. It's a long story, but I can guarantee
>>> that
>>> this should not affect a correct MPI application.
>>>
>>> george.
>>>
>>> PS: For those who want to know the details: we have to send a header
>>> over TCP which contain some very basic information, including the
>>> size
>>> of the fragment. Unfortunately, we have a 2 bytes gap in the header.
>>> As we never initialize these 2 unused bytes, but we send them over
>>> the
>>> wire, valgrind correctly detect the non initialized data transfer.
>>>
>>>
>>> On Mar 12, 2008, at 3:58 PM, Audet, Martin wrote:
>>>
>>>> Hi again,
>>>>
>>>> Thanks Pak for the link and suggesting to start an "orted" deamon,
>>>> by doing so my clients and servers jobs were able to establish an
>>>> intercommunicator between them.
>>>>
>>>> However I modified my programs to perform an MPI_Allgather() of a
>>>> single "int" over the new intercommunicator to test communication a
>>>> litle bit and I did encountered problems. I am now wondering if
>>>> there is a problem in MPI_Allreduce() itself for intercommunicators.
>>>> Note that the same program run without problems with mpich2
>>>> (ch3:sock).
>>>>
>>>> For example if I start orted as follows:
>>>>
>>>> orted --persistent --seed --scope public --universe univ1
>>>>
>>>> and then start the server with three process:
>>>>
>>>> mpiexec --universe univ1 -n 3 ./aserver
>>>>
>>>> it prints:
>>>>
>>>> Server port = '0.2.0:2000'
>>>>
>>>> Now if I start the client with two process as follow (using the
>>>> server port):
>>>>
>>>> mpiexec --universe univ1 -n 2 ./aclient '0.2.0:2000'
>>>>
>>>> The server prints:
>>>>
>>>> intercomm_flag = 1
>>>> intercomm_remote_size = 2
>>>> rem_rank_tbl[2] = { 0 1}
>>>>
>>>> which is the correct output. The client then prints:
>>>>
>>>> intercomm_flag = 1
>>>> intercomm_remote_size = 3
>>>> rem_rank_tbl[3] = { 0 1 2}
>>>> [linux15:30895] *** An error occurred in MPI_Allgather
>>>> [linux15:30895] *** on communicator
>>>> [linux15:30895] *** MPI_ERR_TRUNCATE: message truncated
>>>> [linux15:30895] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>> mpiexec noticed that job rank 0 with PID 30894 on node linux15
>>>> exited on signal 15 (Terminated).
>>>>
>>>> As you can see the first messages are correct but the client job
>>>> terminate with an error (and the server hang).
>>>>
>>>> After re-reading the documentation about MPI_Allgather() over an
>>>> intercommunicator, I don't see anything wrong in my simple code.
>>>> Also if I run the client and server process with valgrind, I get a
>>>> few messages like:
>>>>
>>>> ==29821== Syscall param writev(vector[...]) points to uninitialised
>>>> byte(s)
>>>> ==29821== at 0x36235C2130: writev (in /lib64/libc-2.3.5.so)
>>>> ==29821== by 0x7885583: mca_btl_tcp_frag_send (in /home/publique/
>>>> openmpi-1.2.5/lib/openmpi/mca_btl_tcp.so)
>>>> ==29821== by 0x788501B: mca_btl_tcp_endpoint_send (in /home/
>>>> publique/openmpi-1.2.5/lib/openmpi/mca_btl_tcp.so)
>>>> ==29821== by 0x7467947: mca_pml_ob1_send_request_start_prepare
>>>> (in /home/publique/openmpi-1.2.5/lib/openmpi/mca_pml_ob1.so)
>>>> ==29821== by 0x7461494: mca_pml_ob1_isend (in /home/publique/
>>>> openmpi-1.2.5/lib/openmpi/mca_pml_ob1.so)
>>>> ==29821== by 0x798BF9D: mca_coll_basic_allgather_inter (in /home/
>>>> publique/openmpi-1.2.5/lib/openmpi/mca_coll_basic.so)
>>>> ==29821== by 0x4A5069C: PMPI_Allgather (in /home/publique/
>>>> openmpi-1.2.5/lib/libmpi.so.0.0.0)
>>>> ==29821== by 0x400EED: main (aserver.c:53)
>>>> ==29821== Address 0x40d6cac is not stack'd, malloc'd or (recently)
>>>> free'd
>>>>
>>>> in both MPI_Allgather() and MPI_Comm_disconnect() calls for client
>>>> and server with valgrind always reporting that the address in
>>>> question are "not stack'd, malloc'd or (recently) free'd".
>>>>
>>>> So is there a problem with MPI_Allgather() on intercommunicators or
>>>> am I doing something wrong ?
>>>>
>>>> Thanks,
>>>>
>>>> Martin
>>>>
>>>>
>>>> /* aserver.c */
>>>> #include <stdio.h>
>>>> #include <mpi.h>
>>>>
>>>> #include <assert.h>
>>>> #include <stdlib.h>
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>> int comm_rank,comm_size;
>>>> char port_name[MPI_MAX_PORT_NAME];
>>>> MPI_Comm intercomm;
>>>> int ok_flag;
>>>>
>>>> int intercomm_flag;
>>>> int intercomm_remote_size;
>>>> int *rem_rank_tbl;
>>>> int ii;
>>>>
>>>> MPI_Init(&argc, &argv);
>>>>
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
>>>> MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
>>>>
>>>> ok_flag = (comm_rank != 0) || (argc == 1);
>>>> MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);
>>>>
>>>> if (!ok_flag) {
>>>> if (comm_rank == 0) {
>>>> fprintf(stderr,"Usage: %s\n",argv[0]);
>>>> }
>>>> MPI_Abort(MPI_COMM_WORLD, 1);
>>>> }
>>>>
>>>> MPI_Open_port(MPI_INFO_NULL, port_name);
>>>>
>>>> if (comm_rank == 0) {
>>>> printf("Server port = '%s'\n", port_name);
>>>> }
>>>> MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,
>>>> &intercomm);
>>>>
>>>> MPI_Close_port(port_name);
>>>>
>>>> MPI_Comm_test_inter(intercomm, &intercomm_flag);
>>>> if (comm_rank == 0) {
>>>> printf("intercomm_flag = %d\n", intercomm_flag);
>>>> }
>>>> assert(intercomm_flag != 0);
>>>> MPI_Comm_remote_size(intercomm, &intercomm_remote_size);
>>>> if (comm_rank == 0) {
>>>> printf("intercomm_remote_size = %d\n", intercomm_remote_size);
>>>> }
>>>> rem_rank_tbl = malloc(intercomm_remote_size*sizeof(*rem_rank_tbl));
>>>> MPI_Allgather(&comm_rank, 1, MPI_INT,
>>>> rem_rank_tbl, 1, MPI_INT,
>>>> intercomm);
>>>> if (comm_rank == 0) {
>>>> printf("rem_rank_tbl[%d] = {", intercomm_remote_size);
>>>> for (ii=0; ii < intercomm_remote_size; ii++) {
>>>> printf(" %d", rem_rank_tbl[ii]);
>>>> }
>>>> printf("}\n");
>>>> }
>>>> free(rem_rank_tbl);
>>>>
>>>> MPI_Comm_disconnect(&intercomm);
>>>>
>>>> MPI_Finalize();
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> /* aclient.c */
>>>> #include <stdio.h>
>>>> #include <unistd.h>
>>>>
>>>> #include <mpi.h>
>>>>
>>>> #include <assert.h>
>>>> #include <stdlib.h>
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>> int comm_rank,comm_size;
>>>> int ok_flag;
>>>> MPI_Comm intercomm;
>>>>
>>>> int intercomm_flag;
>>>> int intercomm_remote_size;
>>>> int *rem_rank_tbl;
>>>> int ii;
>>>>
>>>> MPI_Init(&argc, &argv);
>>>>
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
>>>> MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
>>>>
>>>> ok_flag = (comm_rank != 0) || ((argc == 2) && argv[1] &&
>>>> (*argv[1] != '\0'));
>>>> MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);
>>>>
>>>> if (!ok_flag) {
>>>> if (comm_rank == 0) {
>>>> fprintf(stderr,"Usage: %s mpi_port\n", argv[0]);
>>>> }
>>>> MPI_Abort(MPI_COMM_WORLD, 1);
>>>> }
>>>>
>>>> while (MPI_Comm_connect((comm_rank == 0) ? argv[1] : 0,
>>>> MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm) != MPI_SUCCESS) {
>>>> if (comm_rank == 0) {
>>>> printf("MPI_Comm_connect() failled, sleeping and retrying...
>>>> \n");
>>>> }
>>>> sleep(1);
>>>> }
>>>>
>>>> MPI_Comm_test_inter(intercomm, &intercomm_flag);
>>>> if (comm_rank == 0) {
>>>> printf("intercomm_flag = %d\n", intercomm_flag);
>>>> }
>>>> assert(intercomm_flag != 0);
>>>> MPI_Comm_remote_size(intercomm, &intercomm_remote_size);
>>>> if (comm_rank == 0) {
>>>> printf("intercomm_remote_size = %d\n", intercomm_remote_size);
>>>> }
>>>> rem_rank_tbl = malloc(intercomm_remote_size*sizeof(*rem_rank_tbl));
>>>> MPI_Allgather(&comm_rank, 1, MPI_INT,
>>>> rem_rank_tbl, 1, MPI_INT,
>>>> intercomm);
>>>> if (comm_rank == 0) {
>>>> printf("rem_rank_tbl[%d] = {", intercomm_remote_size);
>>>> for (ii=0; ii < intercomm_remote_size; ii++) {
>>>> printf(" %d", rem_rank_tbl[ii]);
>>>> }
>>>> printf("}\n");
>>>> }
>>>> free(rem_rank_tbl);
>>>>
>>>> MPI_Comm_disconnect(&intercomm);
>>>>
>>>> MPI_Finalize();
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> -------------- next part --------------
>>> A non-text attachment was scrubbed...
>>> Name: smime.p7s
>>> Type: application/pkcs7-signature
>>> Size: 2423 bytes
>>> Desc: not available
>>> Url : http://www.open-mpi.org/MailArchives/users/attachments/20080313/642d41dd/attachment.bin
>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> End of users Digest, Vol 841, Issue 1
>>> *************************************
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335