Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Hide Abort output
From: Yves Caniou (yves.caniou_at_[hidden])
Date: 2010-04-06 03:13:27


I really understand the failure idea of the MPI_Abort() function, and it
clearly appeared in the recent mails.

There is an evident advantage for me to have an MPI_Quit() function:
Having such a function would be great in the sens that someone would not have
to code the termination mechanism, whose process can be "long" (running time)
on large scale systems as we are now seeing in HPC, if not correctly
implemented.
If the implementation of the standard does this, it can ensure good routing
and the "best" usage of message transfer mechanisms (architecture dependent)
to even terminate the application the soonest...

Cheers.

.Yves.

PS: It seems that I nearly always forgot to answer on the list. Sorry...

Le Monday 05 April 2010 16:53:57, vous avez écrit :
> Yves
>
> In my view, and I think in the view of those who developed the standard, an
> MPI program that ends in an MPI_Abort call is considered to have failed.
>
> If there is really a need for a mechanism to end an MPI program by a single
> task deciding that a correct answer has been achieved and whatever the
> other tasks are still doing can be considered expendable garbage, then
> perhaps the MPI Forum should be asked to consider a new function which does
> not carry the implication of job failure.
>
> I have never before heard anyone request such a mechanism but maybe there
> are many out there that just figure calling MPI_Abort is good enough.
>
> There is a current MPI Forum working on the 3.0 version of the MPI
> standard. Do you think they should be considering am MPI_Quit subroutine?
>
>
> Dick Treumann - MPI Team
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846 Fax (845) 433-8363
>
>
>
>
> From: Yves Caniou <yves.caniou_at_[hidden]>
>
> To: Richard Treumann/Poughkeepsie/IBM_at_IBMUS
>
> Date: 04/05/2010 10:38 AM
>
> Subject: Re: [OMPI users] Hide Abort output
>
>
>
>
>
>
> I am just saying that it is just about convenience.
>
> If a task in the application shows a divergence, it does not necessarily
> imply
> a failure of the application: It gives a result, non-convergence of the
> whole
> calculus for some initial condition for example.
>
> Another example: if your application is built as a graphe which ends by all
>
> tasks performing its own calculus: the first one finishing causes the
> application to end. Then, for simplicity, you can call MPI_Abort() to end
> the
> application -- even if I agree that it is not the proper way, since each
> task
> should call MPI_Finalize().
> But in the proper way, the first finishing task should communicate to all
> of
> the other that they have to finish, thus implying that you have coded some
> async receive in each task, with a correctly defined protocol, waiting for
> the termination message.
>
> I don't know if you consider this as a dirty trick and if there is other
> practical means to end properly the application in such cases (I am not a
> deep user of MPI), but at least, this should work very fine.
>
> .Yves.
>
> Le Monday 05 April 2010 15:45:47 Richard Treumann, vous avez écrit :
> > I do not really understand your argument.
> >
> > A correct MPI application ends when every task calls MPI_Finalize. I do
> > not know what a "join-node" is.
> >
> > MPI_Abort is for cases like getting an intermediate result that cannot
> > possible be right and deciding (within the application) to give up and
> > announce failure.
> >
> >
> > Dick Treumann - MPI Team
> > IBM Systems & Technology Group
> > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > Tele (845) 433-7846 Fax (845) 433-8363
> >
> >
> >
> >
> > From: Yves Caniou <yves.caniou_at_[hidden]>
> >
> > To: Richard Treumann/Poughkeepsie/IBM_at_IBMUS
> >
> > Date: 04/05/2010 09:38 AM
> >
> > Subject: Re: [OMPI users] Hide Abort output
> >
> >
> >
> >
> >
> >
> > In that case, if your application does not end by a join node, you code
>
> the
>
> > whole thing to make an async receive and the send that kills everyone in
>
> a
>
> > logn messages?
> >
> > .Yves.
> >
> > Le Monday 05 April 2010 15:27:33, vous avez écrit :
> > > Yves
> > >
> > > If an application issued an MPI_Abort, it did not "end correctly". The
> > > MPI_Abort call is intended for one thing only. The application has
> > > recognized that something is so wrong that there is no point in
> >
> > continuing.
> >
> > > The output from an application that ended in MPI_Abort should be
> >
> > considered
> >
> > > suspect (probably incomplete or garbage)
> > >
> > > If you have an application that is calling MPI_Abort to end a valid run
> > > then I would consider that application to be broken.
> > >
> > >
> > >
> > >
> > > Dick Treumann - MPI Team
> > > IBM Systems & Technology Group
> > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > Tele (845) 433-7846 Fax (845) 433-8363
> > >
> > >
> > >
> > >
> > > From: Yves Caniou <yves.caniou_at_[hidden]>
> > >
> > > To: users_at_[hidden]
> > >
> > > Cc: Richard Treumann/Poughkeepsie/IBM_at_IBMUS
> > >
> > > Date: 04/05/2010 09:14 AM
> > >
> > > Subject: Re: [OMPI users] Hide Abort output
> > >
> > > Le Monday 05 April 2010 15:01:42 Richard Treumann, vous avez écrit :
> > > > Why should any software system offer an option which lets the user
> >
> > hide
> >
> > > > all distinction between a run that succeeded and one that failed?
> > > >
> > > > Dick Treumann - MPI Team
> > > > IBM Systems & Technology Group
> > > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > > Tele (845) 433-7846 Fax (845) 433-8363
> > >
> > > I don't understand how your question is related to mine, since in my
> >
> > case,
> >
> > > the
> > > application ends correctly and I don't want any output. :?
> > >
> > > --
> > > Yves Caniou
> > > Associate Professor at Université Lyon 1,
> > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > > * in Information Technology Center, The University of Tokyo,
> > > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > > tel: +81-3-5841-0540
> > > * in National Institute of Informatics
> > > 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > > tel: +81-3-4212-2412
> > > http://graal.ens-lyon.fr/~ycaniou/
> >
> > --
> > Yves Caniou
> > Associate Professor at Université Lyon 1,
> > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > * in Information Technology Center, The University of Tokyo,
> > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > tel: +81-3-5841-0540
> > * in National Institute of Informatics
> > 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > tel: +81-3-4212-2412
> > http://graal.ens-lyon.fr/~ycaniou/
>
> --
> Yves Caniou
> Associate Professor at Université Lyon 1,
> Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> * in Information Technology Center, The University of Tokyo,
> 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> tel: +81-3-5841-0540
> * in National Institute of Informatics
> 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> tel: +81-3-4212-2412
> http://graal.ens-lyon.fr/~ycaniou/

-- 
Yves Caniou
Associate Professor at Université Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
  * in Information Technology Center, The University of Tokyo,
    2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
    tel: +81-3-5841-0540
  * in National Institute of Informatics
    2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
    tel: +81-3-4212-2412 
http://graal.ens-lyon.fr/~ycaniou/