Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] allow job to survive process death
From: Kirk Stako (kirkstako_at_[hidden])
Date: 2011-01-27 09:11:54


I was wondering what support Open MPI has for allowing a job to
continue running when one or more processes in the job die
unexpectedly? Is there a special mpirun flag for this? Any other ways?

It seems obvious that collectives will fail once a process dies, but
would it be possible to create a new group (if you knew which ranks
are dead) that excludes the dead processes - then turn this group into
a working communicator?