Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Hide Abort output
From: Richard Treumann (treumann_at_[hidden])
Date: 2010-04-06 11:01:29


Jeff -

I started a discussion of MPI_Quit on the MPI Forum reflector. I raised
the question because I do not think using MPI_Abort is appropriate.

The situation is when a single task decides the parallel program has
arrived at the desired answer and therefore whatever the other tasks are
currently doing has become irrelevant. The other tasks do not know that
the answer has been found by one of them so they cannot just call
MPI_Finalize.

Do we need a clean and portable way for the task that detects that the
answer has been found and written out to do a single handed termination of
the parallel job?

                Dick

Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363

                                                                                                                                            
  From: Jeff Squyres <jsquyres_at_[hidden]>
                                                                                                                                            
  To: <yves.caniou_at_[hidden]>, "Open MPI Users" <users_at_[hidden]>
                                                                                                                                            
  Date: 04/06/2010 09:35 AM
                                                                                                                                            
  Subject: Re: [OMPI users] Hide Abort output
                                                                                                                                            
  Sent by: users-bounces_at_[hidden]
                                                                                                                                            

I'm not sure I understand what your MPI_Quit function would do differently
than MPI_Abort and/or MPI_Finalize...?

On Apr 6, 2010, at 3:13 AM, Yves Caniou wrote:

> I really understand the failure idea of the MPI_Abort() function, and it
> clearly appeared in the recent mails.
>
> There is an evident advantage for me to have an MPI_Quit() function:
> Having such a function would be great in the sens that someone would not
have
> to code the termination mechanism, whose process can be "long" (running
time)
> on large scale systems as we are now seeing in HPC, if not correctly
> implemented.
> If the implementation of the standard does this, it can ensure good
routing
> and the "best" usage of message transfer mechanisms (architecture
dependent)
> to even terminate the application the soonest...
>
> Cheers.
>
> .Yves.
>
> PS: It seems that I nearly always forgot to answer on the list.
Sorry...
>
> Le Monday 05 April 2010 16:53:57, vous avez écrit :
> > Yves
> >
> > In my view, and I think in the view of those who developed the
standard, an
> > MPI program that ends in an MPI_Abort call is considered to have
failed.
> >
> > If there is really a need for a mechanism to end an MPI program by a
single
> > task deciding that a correct answer has been achieved and whatever the
> > other tasks are still doing can be considered expendable garbage, then
> > perhaps the MPI Forum should be asked to consider a new function which
does
> > not carry the implication of job failure.
> >
> > I have never before heard anyone request such a mechanism but maybe
there
> > are many out there that just figure calling MPI_Abort is good enough.
> >
> > There is a current MPI Forum working on the 3.0 version of the MPI
> > standard. Do you think they should be considering am MPI_Quit
subroutine?
> >
> >
> > Dick Treumann - MPI Team
> > IBM Systems & Technology Group
> > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > Tele (845) 433-7846 Fax (845) 433-8363
> >
> >
> >
> >
> > From: Yves Caniou <yves.caniou_at_[hidden]>
> >
> > To: Richard Treumann/Poughkeepsie/IBM_at_IBMUS
> >
> > Date: 04/05/2010 10:38 AM
> >
> > Subject: Re: [OMPI users] Hide Abort output
> >
> >
> >
> >
> >
> >
> > I am just saying that it is just about convenience.
> >
> > If a task in the application shows a divergence, it does not
necessarily
> > imply
> > a failure of the application: It gives a result, non-convergence of the
> > whole
> > calculus for some initial condition for example.
> >
> > Another example: if your application is built as a graphe which ends by
all
> >
> > tasks performing its own calculus: the first one finishing causes the
> > application to end. Then, for simplicity, you can call MPI_Abort() to
end
> > the
> > application -- even if I agree that it is not the proper way, since
each
> > task
> > should call MPI_Finalize().
> > But in the proper way, the first finishing task should communicate to
all
> > of
> > the other that they have to finish, thus implying that you have coded
some
> > async receive in each task, with a correctly defined protocol, waiting
for
> > the termination message.
> >
> > I don't know if you consider this as a dirty trick and if there is
other
> > practical means to end properly the application in such cases (I am not
a
> > deep user of MPI), but at least, this should work very fine.
> >
> > .Yves.
> >
> > Le Monday 05 April 2010 15:45:47 Richard Treumann, vous avez écrit :
> > > I do not really understand your argument.
> > >
> > > A correct MPI application ends when every task calls MPI_Finalize. I
do
> > > not know what a "join-node" is.
> > >
> > > MPI_Abort is for cases like getting an intermediate result that
cannot
> > > possible be right and deciding (within the application) to give up
and
> > > announce failure.
> > >
> > >
> > > Dick Treumann - MPI Team
> > > IBM Systems & Technology Group
> > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > Tele (845) 433-7846 Fax (845) 433-8363
> > >
> > >
> > >
> > >
> > > From: Yves Caniou <yves.caniou_at_[hidden]>
> > >
> > > To: Richard Treumann/Poughkeepsie/IBM_at_IBMUS
> > >
> > > Date: 04/05/2010 09:38 AM
> > >
> > > Subject: Re: [OMPI users] Hide Abort output
> > >
> > >
> > >
> > >
> > >
> > >
> > > In that case, if your application does not end by a join node, you
code
> >
> > the
> >
> > > whole thing to make an async receive and the send that kills everyone
in
> >
> > a
> >
> > > logn messages?
> > >
> > > .Yves.
> > >
> > > Le Monday 05 April 2010 15:27:33, vous avez écrit :
> > > > Yves
> > > >
> > > > If an application issued an MPI_Abort, it did not "end correctly".
The
> > > > MPI_Abort call is intended for one thing only. The application has
> > > > recognized that something is so wrong that there is no point in
> > >
> > > continuing.
> > >
> > > > The output from an application that ended in MPI_Abort should be
> > >
> > > considered
> > >
> > > > suspect (probably incomplete or garbage)
> > > >
> > > > If you have an application that is calling MPI_Abort to end a valid
run
> > > > then I would consider that application to be broken.
> > > >
> > > >
> > > >
> > > >
> > > > Dick Treumann - MPI Team
> > > > IBM Systems & Technology Group
> > > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > > Tele (845) 433-7846 Fax (845) 433-8363
> > > >
> > > >
> > > >
> > > >
> > > > From: Yves Caniou <yves.caniou_at_[hidden]>
> > > >
> > > > To: users_at_[hidden]
> > > >
> > > > Cc: Richard Treumann/Poughkeepsie/IBM_at_IBMUS
> > > >
> > > > Date: 04/05/2010 09:14 AM
> > > >
> > > > Subject: Re: [OMPI users] Hide Abort output
> > > >
> > > > Le Monday 05 April 2010 15:01:42 Richard Treumann, vous avez
écrit :
> > > > > Why should any software system offer an option which lets the
user
> > >
> > > hide
> > >
> > > > > all distinction between a run that succeeded and one that failed?
> > > > >
> > > > > Dick Treumann - MPI Team
> > > > > IBM Systems & Technology Group
> > > > > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > > > > Tele (845) 433-7846 Fax (845) 433-8363
> > > >
> > > > I don't understand how your question is related to mine, since in
my
> > >
> > > case,
> > >
> > > > the
> > > > application ends correctly and I don't want any output. :?
> > > >
> > > > --
> > > > Yves Caniou
> > > > Associate Professor at Université Lyon 1,
> > > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > > > * in Information Technology Center, The University of Tokyo,
> > > > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > > > tel: +81-3-5841-0540
> > > > * in National Institute of Informatics
> > > > 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > > > tel: +81-3-4212-2412
> > > > http://graal.ens-lyon.fr/~ycaniou/
> > >
> > > --
> > > Yves Caniou
> > > Associate Professor at Université Lyon 1,
> > > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > > * in Information Technology Center, The University of Tokyo,
> > > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > > tel: +81-3-5841-0540
> > > * in National Institute of Informatics
> > > 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > > tel: +81-3-4212-2412
> > > http://graal.ens-lyon.fr/~ycaniou/>
> >
> > --
> > Yves Caniou
> > Associate Professor at Université Lyon 1,
> > Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > * in Information Technology Center, The University of Tokyo,
> > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > tel: +81-3-5841-0540
> > * in National Institute of Informatics
> > 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > tel: +81-3-4212-2412
> > http://graal.ens-lyon.fr/~ycaniou/
>
>
>
> --
> Yves Caniou
> Associate Professor at Université Lyon 1,
> Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> * in Information Technology Center, The University of Tokyo,
> 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> tel: +81-3-5841-0540
> * in National Institute of Informatics
> 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> tel: +81-3-4212-2412
> http://graal.ens-lyon.fr/~ycaniou/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users




graycol.gif
ecblank.gif