You don't specify and based on your description I infer that you are not using a batch/queueing system, but just a rsh/ssh based start-up mechanism.
 
You are absolutely correct. I am using  rsh/ssh based start-up mechanism.

A batch/queueing system might be able to tell you whether a remote computer is still accessible.

Right now I don't have any Idea about batch/queuing system, I will explore about that also. And I think you mean it before launching the jobs.

I think that MPI is not the proper mechanism to achieve what you want. PVM or, maybe better, direct socket programming will probably serve you more.

I will think about these also.

I have already spent significant amount of time in LAM-MPI and OPEN-MPI and due to lack of time I don't want to switch to another mechanism. Anyway Open MPI is doing great for me, Atleast 80% what I want. 
 


Thanks & Regards,
--
Vipin K.
Research Engineer,
C-DOTB, India