Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Add child to another parent.
From: Hugo Meyer (meyer.hugo_at_[hidden])
Date: 2011-03-24 17:57:16

2011/3/24 Ralph Castain <rhc_at_[hidden]>

> You really don't want to do it that way - you'll create a major confusion
> in mpirun and the other daemons about who is where. Have you looked at the
> code in orte/mca/errmgr/hnp/errmgr_hnp.c, line 1573 and following?
I did not look at that, but i will do it right now.

> The ability to relocate a failed child process is already in the trunk - it
> only requires turning "on" with an --enable-recovery flag at runtime if you
> don't need the checkpoint/restart support. If you do need C/R, you can use
> that too (just requires some configure flags).
About this, i'm needing C/R support, because what i'm trying to do is to
restart a process in another node(as a child of the orted residing there)
from a previous checkpoint .I will take a look to the relocation feature
that you are mentioning and try to use it.

> At the least, the cited code should provide guidance on how to correctly
> restart procs if you need your own errmgr module for other reasons.

Again thanks Ralph, you have been very helpful.

Best regards.

Hugo Meyer