Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Hide Abort output
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-04-06 09:34:30


I'm not sure I understand what your MPI_Quit function would do differently than MPI_Abort and/or MPI_Finalize...?

On Apr 6, 2010, at 3:13 AM, Yves Caniou wrote:

> I really understand the failure idea of the MPI_Abort() function, and it
> clearly appeared in the recent mails.
>
> There is an evident advantage for me to have an MPI_Quit() function:
> Having such a function would be great in the sens that someone would not have
> to code the termination mechanism, whose process can be "long" (running time)
> on large scale systems as we are now seeing in HPC, if not correctly
> implemented.
> If the implementation of the standard does this, it can ensure good routing
> and the "best" usage of message transfer mechanisms (architecture dependent)
> to even terminate the application the soonest...
>
> Cheers.
>
> .Yves.
>
> PS: It seems that I nearly always forgot to answer on the list. Sorry...
>
> Le Monday 05 April 2010 16:53:57, vous avez écrit :
> > Yves
> >
> > In my view, and I think in the view of those who developed the standard, an
> > MPI program that ends in an MPI_Abort call is considered to have failed.
> >
> > If there is really a need for a mechanism to end an MPI program by a single
> > task deciding that a correct answer has been achieved and whatever the
> > other tasks are still doing can be considered expendable garbage, then
> > perhaps the MPI Forum should be asked to consider a new function which does
> > not carry the implication of job failure.
> >
> > I have never before heard anyone request such a mechanism but maybe there
> > are many out there that just figure calling MPI_Abort is good enough.
> >
> > There is a current MPI Forum working on the 3.0 version of the MPI
> > standard. Do you think they should be considering am MPI_Quit subroutine?
> >
> >
> > Dick Treumann - MPI Team
> > IBM Systems & Technology Group
> > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > Tele (845) 433-7846 Fax (845) 433-8363
> >
> >
> >
> >
> > From: Yves Caniou <yves.caniou_at_[hidden]>
> >
> > To: Richard Treumann/Poughkeepsie/IBM_at_IBMUS
> >
> > Date: 04/05/2010 10:38 AM
> >
> > Subject: Re: [OMPI users] Hide Abort output
> >
> >
> >
> >
> >
> >
> > I am just saying that it is just about convenience.
> >
> > If a task in the application shows a divergence, it does not necessarily
> > imply
> > a failure of the application: It gives a result, non-convergence of the
> > whole
> > calculus for some initial condition for example.
> >
> > Another example: if your application is built as a graphe which ends by all
> >
> > tasks performing its own calculus: the first one finishing causes the
> > application to end. Then, for simplicity, you can call MPI_Abort() to end
> > the
> > application -- even if I agree that it is not the proper way, since each
> > task
> > should call MPI_Finalize().
> > But in the proper way, the first finishing task should communicate to all
> > of
> > the other that they have to finish, thus implying that you have coded some
> > async receive in each task, with a correctly defined protocol, waiting for
> > the termination message.
> >
> > I don't know if you consider this as a dirty trick and if there is other
> > practical means to end properly the application in such cases (I am not a
> > deep user of MPI), but at least, this should work very fine.
> >
> > .Yves.
> >
> > Le Monday 05 April 2010 15:45:47 Richard Treumann, vous avez écrit :
> > > I do not really understand your argument.
> > >
> > > A correct MPI application ends when every task calls MPI_Finalize. I do
> > > not know what a "join-node" is.
> > >
> > > MPI_Abort is for cases like getting an intermediate result that cannot
> > > possible be right and deciding (within the application) to give up and
> > > announce failure.
> > >
> > >
> > > Dick Treumann - MPI Team
> > > IBM Systems & Technology Group
> > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > Tele (845) 433-7846 Fax (845) 433-8363
> > >
> > >
> > >
> > >
> > > From: Yves Caniou <yves.caniou_at_[hidden]>
> > >
> > > To: Richard Treumann/Poughkeepsie/IBM_at_IBMUS
> > >
> > > Date: 04/05/2010 09:38 AM
> > >
> > > Subject: Re: [OMPI users] Hide Abort output
> > >
> > >
> > >
> > >
> > >
> > >
> > > In that case, if your application does not end by a join node, you code
> >
> > the
> >
> > > whole thing to make an async receive and the send that kills everyone in
> >
> > a
> >
> > > logn messages?
> > >
> > > .Yves.
> > >
> > > Le Monday 05 April 2010 15:27:33, vous avez écrit :
> > > > Yves
> > > >
> > > > If an application issued an MPI_Abort, it did not "end correctly". The
> > > > MPI_Abort call is intended for one thing only. The application has
> > > > recognized that something is so wrong that there is no point in
> > >
> > > continuing.
> > >
> > > > The output from an application that ended in MPI_Abort should be
> > >
> > > considered
> > >
> > > > suspect (probably incomplete or garbage)
> > > >
> > > > If you have an application that is calling MPI_Abort to end a valid run
> > > > then I would consider that application to be broken.
> > > >
> > > >
> > > >
> > > >
> > > > Dick Treumann - MPI Team
> > > > IBM Systems & Technology Group
> > > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > > Tele (845) 433-7846 Fax (845) 433-8363
> > > >
> > > >
> > > >
> > > >
> > > > From: Yves Caniou <yves.caniou_at_[hidden]>
> > > >
> > > > To: users_at_[hidden]
> > > >
> > > > Cc: Richard Treumann/Poughkeepsie/IBM_at_IBMUS
> > > >
> > > > Date: 04/05/2010 09:14 AM
> > > >
> > > > Subject: Re: [OMPI users] Hide Abort output
> > > >
> > > > Le Monday 05 April 2010 15:01:42 Richard Treumann, vous avez écrit :
> > > > > Why should any software system offer an option which lets the user
> > >
> > > hide
> > >
> > > > > all distinction between a run that succeeded and one that failed?
> > > > >
> > > > > Dick Treumann - MPI Team
> > > > > IBM Systems & Technology Group
> > > > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > > > Tele (845) 433-7846 Fax (845) 433-8363
> > > >
> > > > I don't understand how your question is related to mine, since in my
> > >
> > > case,
> > >
> > > > the
> > > > application ends correctly and I don't want any output. :?
> > > >
> > > > --
> > > > Yves Caniou
> > > > Associate Professor at Université Lyon 1,
> > > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > > > * in Information Technology Center, The University of Tokyo,
> > > > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > > > tel: +81-3-5841-0540
> > > > * in National Institute of Informatics
> > > > 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > > > tel: +81-3-4212-2412
> > > > http://graal.ens-lyon.fr/~ycaniou/
> > >
> > > --
> > > Yves Caniou
> > > Associate Professor at Université Lyon 1,
> > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > > * in Information Technology Center, The University of Tokyo,
> > > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > > tel: +81-3-5841-0540
> > > * in National Institute of Informatics
> > > 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > > tel: +81-3-4212-2412
> > > http://graal.ens-lyon.fr/~ycaniou/
> >
> > --
> > Yves Caniou
> > Associate Professor at Université Lyon 1,
> > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > * in Information Technology Center, The University of Tokyo,
> > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > tel: +81-3-5841-0540
> > * in National Institute of Informatics
> > 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > tel: +81-3-4212-2412
> > http://graal.ens-lyon.fr/~ycaniou/
>
>
>
> --
> Yves Caniou
> Associate Professor at Université Lyon 1,
> Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> * in Information Technology Center, The University of Tokyo,
> 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> tel: +81-3-5841-0540
> * in National Institute of Informatics
> 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> tel: +81-3-4212-2412
> http://graal.ens-lyon.fr/~ycaniou/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/