Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fault Tolerant with openib
From: Guilherme V (list.vilela_at_[hidden])
Date: 2011-09-23 15:21:17

I'm using version 1.4.3 and I forgot to tell that I have made a change in
the orterun.c line 792:

    if (ORTE_JOB_STATE_TERMINATED != exit_state) {
                    exit(0); /* patch*/


> What version of OMPI are you using? The job should terminate in either
case - what did you do to keep it running after node failure with tcp?

>On Sep 23, 2011, at 12:34 PM, Guilherme V wrote:
>> Hi,
>> I want to know if anybody is having problems with fault tolerant job
using infiniband. When I run my job with tcp if anything happens with one
node, my job keeps running, but if I change my job to use infiniband if
anything happens with the infiniband (i.e cable problems) my job fails.
>> Anybody knows if there is something different that need to be done when
using openib instead tcp?
>> Bellow a example of the message I'm receiving from the mpi.
>> Regards,
>> Guilherme