Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI program cannot complete
From: David Zhang (solarbikedz_at_[hidden])
Date: 2010-10-25 11:27:19


I think I got this problem before. Put a mpi_barrier(mpi_comm_world) before
mpi_finalize for all processes. For me, mpi terminates nicely only when all
process are calling mpi_finalize the same time. So I do it for all my
programs.

On Mon, Oct 25, 2010 at 7:13 AM, Jack Bryan <dtustudy68_at_[hidden]> wrote:

> Thanks,
> But, I have put a mpi_waitall(request) before
>
> cout << " I am rank " << rank << " I am before MPI_Finalize()" << endl;
>
> If the above sentence has been printed out, it means that all requests have
> been checked and finished. right ?
>
> What may be the possible reasons for that stuck ?
>
> Any help is appreciated.
>
> Jack
>
> Oct. 25 2010
> *
> *
> ------------------------------
> Date: Mon, 25 Oct 2010 05:32:44 -0400
> From: terry.dontje_at_[hidden]
>
> To: users_at_[hidden]
> Subject: Re: [OMPI users] Open MPI program cannot complete
>
> So what you are saying is *all* the ranks have entered MPI_Finalize and
> only a subset has exited per placing prints before and after MPI_Finalize.
> Good. So my guess is that the processes stuck in MPI_Finalize have a prior
> MPI request outstanding that for whatever reason is unable to complete. So
> I would first look at all the MPI requests and make sure they completed.
>
> --td
>
> On 10/25/2010 02:38 AM, Jack Bryan wrote:
>
> thanks
> I found a problem:
>
> I used:
>
> cout << " I am rank " << rank << " I am before MPI_Finalize()" <<
> endl;
> MPI_Finalize();
> cout << " I am rank " << rank << " I am after MPI_Finalize()" << endl;
> return 0;
>
> I can get the output " I am rank 0 (1, 2, ....) I am before
> MPI_Finalize() ".
>
> and
> " I am rank 0 I am after MPI_Finalize() "
> But, other processes do not printed out "I am rank ... I am after
> MPI_Finalize()" .
>
> It is weird. The process has reached the point just before
> MPI_Finalize(), why they are hanged there ?
>
> Are there other better ways to check this ?
>
> Any help is appreciated.
>
> thanks
>
> Jack
>
> Oct. 25 2010
>
> ------------------------------
> From: solarbikedz_at_[hidden]
> Date: Sun, 24 Oct 2010 19:47:54 -0700
> To: users_at_[hidden]
> Subject: Re: [OMPI users] Open MPI program cannot complete
>
> how do you know all process call mpi_finalize? did you have all of them
> print out something before they call mpi_finalize? I think what Gustavo is
> getting at is maybe you had some MPI calls within your snippets that hangs
> your program, thus some of your processes never called mpi_finalize.
>
> On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan <dtustudy68_at_[hidden]>wrote:
>
> Thanks,
>
> But, my code is too long to be posted.
>
> What are the common reasons of this kind of problems ?
>
> Any help is appreciated.
>
> Jack
>
> Oct. 24 2010
>
> > From: gus_at_[hidden]
> > Date: Sun, 24 Oct 2010 18:09:52 -0400
>
> > To: users_at_[hidden]
> > Subject: Re: [OMPI users] Open MPI program cannot complete
> >
> > Hi Jack
> >
> > Your code snippet is too terse, doesn't show the MPI calls.
> > It is hard to guess what is the problem this way.
> >
> > Gus Correa
> > On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:
> >
> > > Thanks for the reply.
> > > But, I use mpi_waitall() to make sure that all MPI communications have
> been done before a process call MPI_Finalize() and returns.
> > >
> > > Any help is appreciated.
> > >
> > > thanks
> > >
> > > Jack
> > >
> > > Oct. 24 2010
> > >
> > > > From: gus_at_[hidden]
> > > > Date: Sun, 24 Oct 2010 17:31:11 -0400
> > > > To: users_at_[hidden]
> > > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > > >
> > > > Hi Jack
> > > >
> > > > It may depend on "do some things".
> > > > Does it involve MPI communication?
> > > >
> > > > Also, why not put MPI_Finalize();return 0 outside the ifs?
> > > >
> > > > Gus Correa
> > > >
> > > > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > I got a problem of open MPI.
> > > > >
> > > > > My program has 5 processes.
> > > > >
> > > > > All of them can run MPI_Finalize() and return 0.
> > > > >
> > > > > But, the whole program cannot be completed.
> > > > >
> > > > > In the MPI cluster job queue, it is strill in running status.
> > > > >
> > > > > If I use 1 process to run it, no problem.
> > > > >
> > > > > Why ?
> > > > >
> > > > > My program:
> > > > >
> > > > > int main (int argc, char **argv)
> > > > > {
> > > > >
> > > > > MPI_Init(&argc, &argv);
> > > > > MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
> > > > > MPI_Comm_size(MPI_COMM_WORLD, &mySize);
> > > > > MPI_Comm world;
> > > > > world = MPI_COMM_WORLD;
> > > > >
> > > > > if (myRank == 0)
> > > > > {
> > > > > do some things.
> > > > > }
> > > > >
> > > > > if (myRank != 0)
> > > > > {
> > > > > do some things.
> > > > > MPI_Finalize();
> > > > > return 0 ;
> > > > > }
> > > > > if (myRank == 0)
> > > > > {
> > > > > MPI_Finalize();
> > > > > return 0;
> > > > > }
> > > > >
> > > > > }
> > > > >
> > > > > And, some output files get wrong codes, which can not be readible.
> > > > > In 1-process case, the program can print correct results to these
> output files .
> > > > >
> > > > > Any help is appreciated.
> > > > >
> > > > > thanks
> > > > >
> > > > > Jack
> > > > >
> > > > > Oct. 24 2010
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users_at_[hidden]
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> --
> David Zhang
> University of California, San Diego
>
> _______________________________________________ users mailing list
> users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing listusers_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> [image: Oracle]
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle * - Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden]
>
>
>
>
> _______________________________________________ users mailing list
> users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
David Zhang
University of California, San Diego