Jeff -

I started a discussion of MPI_Quit on the MPI Forum reflector. I raised the question because I do not think using MPI_Abort is appropriate.

The situation is when a single task decides the parallel program has arrived at the desired answer and therefore whatever the other tasks are currently doing has become irrelevant. The other tasks do not know that the answer has been found by one of them so they cannot just call MPI_Finalize.

Do we need a clean and portable way for the task that detects that the answer has been found and written out to do a single handed termination of the parallel job?

Dick


Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


Inactive hide details for Jeff Squyres ---04/06/2010 09:35:59 AM---I'm not sure I understand what your MPI_Quit function would Jeff Squyres ---04/06/2010 09:35:59 AM---I'm not sure I understand what your MPI_Quit function would do differently than MPI_Abort and/or MPI


From:

Jeff Squyres <jsquyres@cisco.com>

To:

<yves.caniou@ens-lyon.fr>, "Open MPI Users" <users@open-mpi.org>

Date:

04/06/2010 09:35 AM

Subject:

Re: [OMPI users] Hide Abort output

Sent by:

users-bounces@open-mpi.org





I'm not sure I understand what your MPI_Quit function would do differently than MPI_Abort and/or MPI_Finalize...?

On Apr 6, 2010, at 3:13 AM, Yves Caniou wrote:

> I really understand the failure idea of the MPI_Abort() function, and it
> clearly appeared in the recent mails.
>
> There is an evident advantage for me to have an MPI_Quit() function:
> Having such a function would be great in the sens that someone would not have
> to code the termination mechanism, whose process can be "long" (running time)
> on large scale systems as we are now seeing in HPC, if not correctly
> implemented.
> If the implementation of the standard does this, it can ensure good routing
> and the "best" usage of message transfer mechanisms (architecture dependent)
> to even terminate the application the soonest...
>
> Cheers.
>
> .Yves.
>
> PS:   It seems that I nearly always forgot to answer on the list. Sorry...
>
> Le Monday 05 April 2010 16:53:57, vous avez écrit :
> > Yves
> >
> > In my view, and I think in the view of those who developed the standard, an
> > MPI program that ends in an MPI_Abort call is considered to have failed.
> >
> > If there is really a need for a mechanism to end an MPI program by a single
> > task deciding that a correct answer has been achieved and whatever the
> > other tasks are still doing can be considered expendable garbage, then
> > perhaps the MPI Forum should be asked to consider a new function which does
> > not carry the implication of job failure.
> >
> > I have never before heard anyone request such a mechanism but maybe there
> > are many out there that just figure calling MPI_Abort is good enough.
> >
> > There is a current MPI Forum working on the 3.0 version of the MPI
> > standard.  Do you think they should be considering am MPI_Quit subroutine?
> >
> >
> > Dick Treumann  -  MPI Team
> > IBM Systems & Technology Group
> > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > Tele (845) 433-7846         Fax (845) 433-8363
> >
> >
> >
> >
> >   From:       Yves Caniou <yves.caniou@ens-lyon.fr>
> >
> >   To:         Richard Treumann/Poughkeepsie/IBM@IBMUS
> >
> >   Date:       04/05/2010 10:38 AM
> >
> >   Subject:    Re: [OMPI users] Hide Abort output
> >
> >
> >
> >
> >
> >
> > I am just saying that it is just about convenience.
> >
> > If a task in the application shows a divergence, it does not necessarily
> > imply
> > a failure of the application: It gives a result, non-convergence of the
> > whole
> > calculus for some initial condition for example.
> >
> > Another example: if your application is built as a graphe which ends by all
> >
> > tasks performing its own calculus: the first one finishing causes the
> > application to end. Then, for simplicity, you can call MPI_Abort() to end
> > the
> > application -- even if I agree that it is not the proper way, since each
> > task
> > should call MPI_Finalize().
> > But in the proper way, the first finishing task should communicate to all
> > of
> > the other that they have to finish, thus implying that you have coded some
> > async receive in each task, with a correctly defined protocol, waiting for
> > the termination message.
> >
> > I don't know if you consider this as a dirty trick and if there is other
> > practical means to end properly the application in such cases (I am not a
> > deep user of MPI), but at least, this should work very fine.
> >
> > .Yves.
> >
> > Le Monday 05 April 2010 15:45:47 Richard Treumann, vous avez écrit :
> > > I do not really understand your argument.
> > >
> > > A correct MPI application ends when every task calls MPI_Finalize.  I do
> > > not know what a "join-node" is.
> > >
> > >  MPI_Abort is for cases like getting an intermediate result that cannot
> > > possible be right and deciding (within the application) to give up and
> > > announce failure.
> > >
> > >
> > > Dick Treumann  -  MPI Team
> > > IBM Systems & Technology Group
> > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > Tele (845) 433-7846         Fax (845) 433-8363
> > >
> > >
> > >
> > >
> > >   From:       Yves Caniou <yves.caniou@ens-lyon.fr>
> > >
> > >   To:         Richard Treumann/Poughkeepsie/IBM@IBMUS
> > >
> > >   Date:       04/05/2010 09:38 AM
> > >
> > >   Subject:    Re: [OMPI users] Hide Abort output
> > >
> > >
> > >
> > >
> > >
> > >
> > > In that case, if your application does not end by a join node, you code
> >
> > the
> >
> > > whole thing to make an async receive and the send that kills everyone in
> >
> > a
> >
> > > logn messages?
> > >
> > > .Yves.
> > >
> > > Le Monday 05 April 2010 15:27:33, vous avez écrit :
> > > > Yves
> > > >
> > > > If an application issued an MPI_Abort, it did not "end correctly".  The
> > > > MPI_Abort call is intended for one thing only.  The application has
> > > > recognized that something is so wrong that there is no point in
> > >
> > > continuing.
> > >
> > > > The output from an application that ended in MPI_Abort should be
> > >
> > > considered
> > >
> > > > suspect (probably incomplete or garbage)
> > > >
> > > > If you have an application that is calling MPI_Abort to end a valid run
> > > > then I  would consider that application to be broken.
> > > >
> > > >
> > > >
> > > >
> > > > Dick Treumann  -  MPI Team
> > > > IBM Systems & Technology Group
> > > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > > Tele (845) 433-7846         Fax (845) 433-8363
> > > >
> > > >
> > > >
> > > >
> > > >   From:       Yves Caniou <yves.caniou@ens-lyon.fr>
> > > >
> > > >   To:         users@open-mpi.org
> > > >
> > > >   Cc:         Richard Treumann/Poughkeepsie/IBM@IBMUS
> > > >
> > > >   Date:       04/05/2010 09:14 AM
> > > >
> > > >   Subject:    Re: [OMPI users] Hide Abort output
> > > >
> > > > Le Monday 05 April 2010 15:01:42 Richard Treumann, vous avez écrit :
> > > > > Why should any software system  offer an option which lets the user
> > >
> > > hide
> > >
> > > > > all distinction between a run that succeeded and one that failed?
> > > > >
> > > > > Dick Treumann  -  MPI Team
> > > > > IBM Systems & Technology Group
> > > > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > > > Tele (845) 433-7846         Fax (845) 433-8363
> > > >
> > > > I don't understand how your question is related to mine, since in my
> > >
> > > case,
> > >
> > > > the
> > > > application ends correctly and I don't want any output. :?
> > > >
> > > > --
> > > > Yves Caniou
> > > > Associate Professor at Université Lyon 1,
> > > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > > >   * in Information Technology Center, The University of Tokyo,
> > > >     2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > > >     tel: +81-3-5841-0540
> > > >   * in National Institute of Informatics
> > > >     2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > > >     tel: +81-3-4212-2412
> > > >
http://graal.ens-lyon.fr/~ycaniou/
> > >
> > > --
> > > Yves Caniou
> > > Associate Professor at Université Lyon 1,
> > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > >   * in Information Technology Center, The University of Tokyo,
> > >     2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > >     tel: +81-3-5841-0540
> > >   * in National Institute of Informatics
> > >     2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > >     tel: +81-3-4212-2412
> > >
http://graal.ens-lyon.fr/~ycaniou/
> >
> > --
> > Yves Caniou
> > Associate Professor at Université Lyon 1,
> > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> >   * in Information Technology Center, The University of Tokyo,
> >     2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> >     tel: +81-3-5841-0540
> >   * in National Institute of Informatics
> >     2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> >     tel: +81-3-4212-2412
> >
http://graal.ens-lyon.fr/~ycaniou/
>
>
>
> --
> Yves Caniou
> Associate Professor at Université Lyon 1,
> Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
>   * in Information Technology Center, The University of Tokyo,
>     2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
>     tel: +81-3-5841-0540
>   * in National Institute of Informatics
>     2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
>     tel: +81-3-4212-2412
>
http://graal.ens-lyon.fr/~ycaniou/
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
>
http://www.open-mpi.org/mailman/listinfo.cgi/users
>


--
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users