Maybe you could make a system call to ping the other machine.
// build the command string
sprintf(sCommand, "ping -c %d -q %s > /dev/null", numPings, sHostName);
// execute the command
int iResult =system(sCommand);
If the ping was successful, iResult will have the value 0.
On Thu, Jul 23, 2009 at 1:36 PM, vipin kumar<vipinkumar41_at_[hidden]> wrote:
> On Thu, Jul 23, 2009 at 3:03 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> It depends on which network fails. If you lose all TCP connectivity, Open
>> MPI should abort the job as the out-of-band system will detect the loss of
>> connection. If you only lose the MPI connection (whether TCP or some other
>> interconnect), then I believe the system will eventually generate an error
>> after it retries sending the message a specified number of times, though it
>> may not abort.
> Thank you Ralph,
> From your reply I came to know that the question I posted earlier was not
> reflecting the problem properly.
> I can't use blocking communication routines in my main program (
> "masterprocess") because any type of network failure( may be due to physical
> connectivity or TCP connectivity or MPI connection as you told) may occur.
> So I am using non blocking point to point communication routines, and TEST
> later for completion of that Request. Once I enter a TEST loop I will test
> for Request complition till TIMEOUT. Suppose TIMEOUT has occured, In this
> case first I will check whether
> 1: Slave machine is reachable or not, (How I will do that ??? Given - I
> have IP address and Host Name of Slave machine.)
> 2: if reachable, check whether program(orted and "slaveprocess") is alive
> or not.
> I don't want to abort my master process in case 1 and hope that network
> connection will come up in future. Fortunately OpenMPI doesn't abort any
> process. Both processes can run independently without communicating.
> Thanks and Regards,
> Vipin K.
> Research Engineer,
> C-DOTB, India
> users mailing list