The kind of recovery I am seeking after is easy, and the following
simple example illustrates the point:
I want to send a message to a different node. If it does not respond to
me, I do not want my application to crash. I want to continue using
other node resources.
I hate it when a node crashes that all my MPI_WORLD goes now. Is there
a way around that?
I could give up the other extra hairy stuff, but it would be difficult
to lose my running job whenever a node or application goes down.
> Send users mailing list submissions to
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
> You can reach the person managing the list at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
> Today's Topics:
> 1. threading (Sam Adams)
> 2. Re: users Digest, Vol 536, Issue 1 (Mohammad Huwaidi)
> 3. Fault Tolerance (Mohammad Huwaidi)
> 4. Re: Fault Tolerance (Thomas Spraggins)
> 5. Re: Fault Tolerance (George Bosilca)
> 6. MPI processes swapping out (Heywood, Todd)
> 7. deadlock on barrier (tim gunter)
> Message: 1
> Date: Wed, 21 Mar 2007 11:29:34 -0500
> From: "Sam Adams" <smadasam_at_[hidden]>
> Subject: [OMPI users] threading
> To: users_at_[hidden]
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> I have been looking, but I haven't really found a good answer about
> system level threading. We are about to get a new cluster of
> dual-processor quad-core nodes or 8 cores per node. Traditionally I
> would just tell MPI to launch two processes per dual processor single
> core node, but with eight cores on a node, having 8 processes seems
> My question is this: does OpenMPI sense that there are multiple cores
> on a node and use something like pthreads instead of creating new
> processes automatically when I request 8 processes for a node, or
> should I run a single process per node and use OpenMP or pthreads
> explicitly to get better performance on a per node basis?
We can't resolve problems by using the same kind of thinking we used
when we created them.