Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?
From: Bill Rankin (Bill.Rankin_at_[hidden])
Date: 2011-02-22 11:06:57


Try putting an "MPI_Barrier()" call before your MPI_Finalize() [*]. I suspect that one of the programs (the sending side) is calling Finalize before the receiving side has processed the messages.

-bill

[*] pet peeve of mine : this should almost always be standard practice.

> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> Behalf Of Xianglong Kong
> Sent: Tuesday, February 22, 2011 10:27 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Beginner's question: why multiple sends or
> receives don't work?
>
> Hi, Thank you for the reply.
>
> However, using MPI_waitall instead of MPI_wait didn't solve the
> problem. The code would hang at the MPI_waitall. Also, I'm not quit
> understand why the code is inherently unsafe. Can the non-blocking
> send or receive cause any deadlock?
>
> Thanks!
>
> Kong
>
> On Mon, Feb 21, 2011 at 2:32 PM, Jeff Squyres <jsquyres_at_[hidden]>
> wrote:
> > It's because you're waiting on the receive request to complete before
> the send request.  This likely works locally because the message
> transfer is through shared memory and is fast, but it's still an
> inherently unsafe way to block waiting for completion (i.e., the
> receive might not complete if the send does not complete).
> >
> > What you probably want to do is build an array of 2 requests and then
> issue a single MPI_Waitall() on both of them.  This will allow MPI to
> progress both requests simultaneously.
> >
> >
> > On Feb 18, 2011, at 11:58 AM, Xianglong Kong wrote:
> >
> >> Hi, all,
> >>
> >> I'm an mpi newbie. I'm trying to connect two desktops in my office
> >> with each other using a crossing cable and implement a parallel code
> >> on them using MPI.
> >>
> >> Now, the two nodes can ssh to each other without password, and can
> >> successfully run the MPI "Hello world" code. However, when I tried
> to
> >> use multiple MPI non-blocking sends or receives, the job would hang.
> >> The problem only showed up if the two processes are launched in the
> >> different nodes, the code can run successfully if the two processes
> >> are launched in the same node. Also, the code can run successfully
> if
> >> there are only one send or/and one receive in each process.
> >>
> >> Here is the code that can run successfully:
> >>
> >> #include <stdlib.h>
> >> #include <stdio.h>
> >> #include <string.h>
> >> #include <mpi.h>
> >>
> >> int main(int argc, char** argv) {
> >>
> >>       int myrank, nprocs;
> >>
> >>       MPI_Init(&argc, &argv);
> >>       MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
> >>       MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
> >>
> >>       printf("Hello from processor %d of %d\n", myrank, nprocs);
> >>
> >>       MPI_Request reqs1, reqs2;
> >>       MPI_Status stats1, stats2;
> >>
> >>       int tag1=10;
> >>       int tag2=11;
> >>
> >>       int buf;
> >>       int mesg;
> >>       int source=1-myrank;
> >>       int dest=1-myrank;
> >>
> >>       if(myrank==0)
> >>       {
> >>               mesg=1;
> >>
> >>               MPI_Irecv(&buf, 1, MPI_INT, source, tag1,
> MPI_COMM_WORLD, &reqs1);
> >>               MPI_Isend(&mesg, 1, MPI_INT, dest,  tag2,
> MPI_COMM_WORLD, &reqs2);
> >>
> >>
> >>       }
> >>
> >>       if(myrank==1)
> >>       {
> >>               mesg=2;
> >>
> >>               MPI_Irecv(&buf, 1, MPI_INT, source, tag2,
> MPI_COMM_WORLD, &reqs1);
> >>               MPI_Isend(&mesg, 1, MPI_INT,  dest, tag1,
> MPI_COMM_WORLD, &reqs2);
> >>       }
> >>
> >>       MPI_Wait(&reqs1, &stats1);
> >>       printf("myrank=%d,received the message\n",myrank);
> >>
> >>       MPI_Wait(&reqs2, &stats2);
> >>       printf("myrank=%d,sent the messages\n",myrank);
> >>
> >>       printf("myrank=%d, buf=%d\n",myrank, buf);
> >>
> >>       MPI_Finalize();
> >>       return 0;
> >> }
> >>
> >> And here is the code that will hang
> >>
> >> #include <stdlib.h>
> >> #include <stdio.h>
> >> #include <string.h>
> >> #include <mpi.h>
> >>
> >> int main(int argc, char** argv) {
> >>
> >>       int myrank, nprocs;
> >>
> >>       MPI_Init(&argc, &argv);
> >>       MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
> >>       MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
> >>
> >>       printf("Hello from processor %d of %d\n", myrank, nprocs);
> >>
> >>       MPI_Request reqs1, reqs2;
> >>       MPI_Status stats1, stats2;
> >>
> >>       int tag1=10;
> >>       int tag2=11;
> >>
> >>       int source=1-myrank;
> >>       int dest=1-myrank;
> >>
> >>       if(myrank==0)
> >>       {
> >>               int buf1, buf2;
> >>
> >>               MPI_Irecv(&buf1, 1, MPI_INT, source, tag1,
> MPI_COMM_WORLD, &reqs1);
> >>               MPI_Irecv(&buf2, 1, MPI_INT, source, tag2,
> MPI_COMM_WORLD, &reqs2);
> >>
> >>               MPI_Wait(&reqs1, &stats1);
> >>               printf("received one message\n");
> >>
> >>               MPI_Wait(&reqs2, &stats2);
> >>               printf("received two messages\n");
> >>
> >>               printf("myrank=%d, buf1=%d, buf2=%d\n",myrank, buf1,
> buf2);
> >>       }
> >>
> >>       if(myrank==1)
> >>       {
> >>               int mesg1=1;
> >>               int mesg2=2;
> >>
> >>               MPI_Isend(&mesg1, 1, MPI_INT, dest, tag1,
> MPI_COMM_WORLD, &reqs1);
> >>               MPI_Isend(&mesg2, 1, MPI_INT, dest, tag2,
> MPI_COMM_WORLD, &reqs2);
> >>
> >>               MPI_Wait(&reqs1, &stats1);
> >>               printf("sent one message\n");
> >>
> >>               MPI_Wait(&reqs2, &stats2);
> >>               printf("sent two messages\n");
> >>       }
> >>
> >>       MPI_Finalize();
> >>       return 0;
> >> }
> >>
> >> And the output of the second failed code:
> >> ***********************************************
> >> Hello from processor 0 of 2
> >>
> >> Received one message
> >>
> >> Hello from processor 1 of 2
> >>
> >> Sent one message
> >> *******************************************************
> >>
> >> Can anyone help to point out why the second code didn't work?
> >>
> >> Thanks!
> >>
> >> Kong
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>
>
> --
> Xianglong Kong
> Department of Mechanical Engineering
> University of Rochester
> Phone: (585)520-4412
> MSN: dinosaur8312_at_[hidden]
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users