Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-02-22 16:19:45


On Feb 22, 2011, at 11:06 AM, Bill Rankin wrote:

> Try putting an "MPI_Barrier()" call before your MPI_Finalize() [*]. I suspect that one of the programs (the sending side) is calling Finalize before the receiving side has processed the messages.

FWIW: I have rarely seen this to be the issue.

MPI does not guarantee point-to-point progress when you are in a collective. Some implementations do this anyone; others do not (e.g., some of OMPI's transports will; others will not).

In short, programs are erroneous that do not guarantee that all their outstanding requests have completed before calling finalize.

Also, I first read your email on a phone and did not notice that you had *2* sets of source code. Sorry for the confusion. I just copied your 2nd code to my test cluster and it runs fine for me across multiple nodes -- it does not hang. The order of waits seems correct to me.

> -bill
>
> [*] pet peeve of mine : this should almost always be standard practice.
>
>
>> -----Original Message-----
>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
>> Behalf Of Xianglong Kong
>> Sent: Tuesday, February 22, 2011 10:27 AM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Beginner's question: why multiple sends or
>> receives don't work?
>>
>> Hi, Thank you for the reply.
>>
>> However, using MPI_waitall instead of MPI_wait didn't solve the
>> problem. The code would hang at the MPI_waitall. Also, I'm not quit
>> understand why the code is inherently unsafe. Can the non-blocking
>> send or receive cause any deadlock?
>>
>> Thanks!
>>
>> Kong
>>
>> On Mon, Feb 21, 2011 at 2:32 PM, Jeff Squyres <jsquyres_at_[hidden]>
>> wrote:
>>> It's because you're waiting on the receive request to complete before
>> the send request. This likely works locally because the message
>> transfer is through shared memory and is fast, but it's still an
>> inherently unsafe way to block waiting for completion (i.e., the
>> receive might not complete if the send does not complete).
>>>
>>> What you probably want to do is build an array of 2 requests and then
>> issue a single MPI_Waitall() on both of them. This will allow MPI to
>> progress both requests simultaneously.
>>>
>>>
>>> On Feb 18, 2011, at 11:58 AM, Xianglong Kong wrote:
>>>
>>>> Hi, all,
>>>>
>>>> I'm an mpi newbie. I'm trying to connect two desktops in my office
>>>> with each other using a crossing cable and implement a parallel code
>>>> on them using MPI.
>>>>
>>>> Now, the two nodes can ssh to each other without password, and can
>>>> successfully run the MPI "Hello world" code. However, when I tried
>> to
>>>> use multiple MPI non-blocking sends or receives, the job would hang.
>>>> The problem only showed up if the two processes are launched in the
>>>> different nodes, the code can run successfully if the two processes
>>>> are launched in the same node. Also, the code can run successfully
>> if
>>>> there are only one send or/and one receive in each process.
>>>>
>>>> Here is the code that can run successfully:
>>>>
>>>> #include <stdlib.h>
>>>> #include <stdio.h>
>>>> #include <string.h>
>>>> #include <mpi.h>
>>>>
>>>> int main(int argc, char** argv) {
>>>>
>>>> int myrank, nprocs;
>>>>
>>>> MPI_Init(&argc, &argv);
>>>> MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
>>>>
>>>> printf("Hello from processor %d of %d\n", myrank, nprocs);
>>>>
>>>> MPI_Request reqs1, reqs2;
>>>> MPI_Status stats1, stats2;
>>>>
>>>> int tag1=10;
>>>> int tag2=11;
>>>>
>>>> int buf;
>>>> int mesg;
>>>> int source=1-myrank;
>>>> int dest=1-myrank;
>>>>
>>>> if(myrank==0)
>>>> {
>>>> mesg=1;
>>>>
>>>> MPI_Irecv(&buf, 1, MPI_INT, source, tag1,
>> MPI_COMM_WORLD, &reqs1);
>>>> MPI_Isend(&mesg, 1, MPI_INT, dest, tag2,
>> MPI_COMM_WORLD, &reqs2);
>>>>
>>>>
>>>> }
>>>>
>>>> if(myrank==1)
>>>> {
>>>> mesg=2;
>>>>
>>>> MPI_Irecv(&buf, 1, MPI_INT, source, tag2,
>> MPI_COMM_WORLD, &reqs1);
>>>> MPI_Isend(&mesg, 1, MPI_INT, dest, tag1,
>> MPI_COMM_WORLD, &reqs2);
>>>> }
>>>>
>>>> MPI_Wait(&reqs1, &stats1);
>>>> printf("myrank=%d,received the message\n",myrank);
>>>>
>>>> MPI_Wait(&reqs2, &stats2);
>>>> printf("myrank=%d,sent the messages\n",myrank);
>>>>
>>>> printf("myrank=%d, buf=%d\n",myrank, buf);
>>>>
>>>> MPI_Finalize();
>>>> return 0;
>>>> }
>>>>
>>>> And here is the code that will hang
>>>>
>>>> #include <stdlib.h>
>>>> #include <stdio.h>
>>>> #include <string.h>
>>>> #include <mpi.h>
>>>>
>>>> int main(int argc, char** argv) {
>>>>
>>>> int myrank, nprocs;
>>>>
>>>> MPI_Init(&argc, &argv);
>>>> MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
>>>>
>>>> printf("Hello from processor %d of %d\n", myrank, nprocs);
>>>>
>>>> MPI_Request reqs1, reqs2;
>>>> MPI_Status stats1, stats2;
>>>>
>>>> int tag1=10;
>>>> int tag2=11;
>>>>
>>>> int source=1-myrank;
>>>> int dest=1-myrank;
>>>>
>>>> if(myrank==0)
>>>> {
>>>> int buf1, buf2;
>>>>
>>>> MPI_Irecv(&buf1, 1, MPI_INT, source, tag1,
>> MPI_COMM_WORLD, &reqs1);
>>>> MPI_Irecv(&buf2, 1, MPI_INT, source, tag2,
>> MPI_COMM_WORLD, &reqs2);
>>>>
>>>> MPI_Wait(&reqs1, &stats1);
>>>> printf("received one message\n");
>>>>
>>>> MPI_Wait(&reqs2, &stats2);
>>>> printf("received two messages\n");
>>>>
>>>> printf("myrank=%d, buf1=%d, buf2=%d\n",myrank, buf1,
>> buf2);
>>>> }
>>>>
>>>> if(myrank==1)
>>>> {
>>>> int mesg1=1;
>>>> int mesg2=2;
>>>>
>>>> MPI_Isend(&mesg1, 1, MPI_INT, dest, tag1,
>> MPI_COMM_WORLD, &reqs1);
>>>> MPI_Isend(&mesg2, 1, MPI_INT, dest, tag2,
>> MPI_COMM_WORLD, &reqs2);
>>>>
>>>> MPI_Wait(&reqs1, &stats1);
>>>> printf("sent one message\n");
>>>>
>>>> MPI_Wait(&reqs2, &stats2);
>>>> printf("sent two messages\n");
>>>> }
>>>>
>>>> MPI_Finalize();
>>>> return 0;
>>>> }
>>>>
>>>> And the output of the second failed code:
>>>> ***********************************************
>>>> Hello from processor 0 of 2
>>>>
>>>> Received one message
>>>>
>>>> Hello from processor 1 of 2
>>>>
>>>> Sent one message
>>>> *******************************************************
>>>>
>>>> Can anyone help to point out why the second code didn't work?
>>>>
>>>> Thanks!
>>>>
>>>> Kong
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Xianglong Kong
>> Department of Mechanical Engineering
>> University of Rochester
>> Phone: (585)520-4412
>> MSN: dinosaur8312_at_[hidden]
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/