Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)
From: Ken Lloyd (kenneth.lloyd_at_[hidden])
Date: 2010-12-18 09:49:12


Nick Maclaren,

Yes, this is a hard problem. It is not endemic to OpenMPI, however.
This hints at the distributed memory/process/thread issues either
through the various OSs or alternately external to them in many solution
spaces.

Jeff Squyers statement that "flexible dynamic processing is not
something many people would ask for" is troubling. Do pthreads provide
such a great solution strategy to these problems?

In other words, if we were to offer a true "flexible dynamic
processing" (which I personally would advocate), would they (the
developers and users) come?

K.A. Lloyd

On Sat, 2010-12-18 at 12:15 +0000, N.M. Maclaren wrote:
> On Dec 17 2010, Jeff Squyres wrote:
> >
> > It's not an unknown problem -- as George and Ralph were trying to say, it
> > was a design decision on our part.
> >
> > Sadly, flexible dynamic processing is not something that many people ask
> > for. We have invested time in it over the year to get it working and have
> > a baseline functionality level. Beyond that, we unfortunately simply
> > haven't had enough requests to justify spending time to do stuff like you
> > suggest (e.g., allow abnormal termination of MPI-disconnected processes
> > to not also take down previously-connected processes). :-(
>
> And my responses (which were probably confusing) were some hint as to WHY
> it is a hard problem. I have a lot of experience at this level for a very
> wide range of systems, and it's something that I would hate to have to
> implement even for a single system - let alone for the range of systems
> that OpenMPI supports.
>
> I could tell you some horror stories of processes owned by one user taking
> down ones owned by OTHER users, because the controlling terminal had been
> reused. And, upon investigation, it wasn't even possible to identify a
> bug in any of the programs or operating system - it was merely a "gotcha"
> that had sneaked through the cracks in the specifications and bitten me
> in a painful place.
>
> The following is what I teach about it in my course (in full):
>
> You can add groups of processes dynamically \break
> {\cyan MPI-2} is probably the best way to do this \break
>
> \bully My recommendation is don't even {\magenta think} of it \break
>
> This was a nightmare area in {\cyan PVM} \break
> The potential system problems are unbelievable \break
>
> And that is even if you are your own {\sky administrator} \break
> If you aren't, you may get strangled for using this \break
>
> Regards,
> Nick Maclaren.
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel