Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Network connection check
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-07-23 05:33:42

It depends on which network fails. If you lose all TCP connectivity,
Open MPI should abort the job as the out-of-band system will detect
the loss of connection. If you only lose the MPI connection (whether
TCP or some other interconnect), then I believe the system will
eventually generate an error after it retries sending the message a
specified number of times, though it may not abort.

On Jul 22, 2009, at 10:55 PM, vipin kumar wrote:

> Are you asking to find out this information before issuing
> "mpirun"? Open MPI does assume that the nodes you are trying to use
> are reachable.
> NO,
> Scenario is a pair of processes are running one in "master" node say
> "masterprocess" and one in "slave" node say "slaveprocess". When
> "masterprocess" needs service of slave process, it sends message to
> "slaveprocess" and "slaveprocess" serves its request. In case of
> Network failure(by any means) "masterprocess" will keep trying to
> send message to "slaveprocess" without knowing that it is not
> reachable. So how "masterprocess" should finds out that
> "slaveprocess" can't be reached and leave attempting to send
> messages till Connection is not up.
> Thanks & Regards,
> --
> Vipin K.
> Research Engineer,
> C-DOTB, India
> _______________________________________________
> users mailing list
> users_at_[hidden]