2011/3/24 Ralph Castain <firstname.lastname@example.org>
You really don't want to do it that way - you'll create a major confusion in mpirun and the other daemons about who is where. Have you looked at the code in orte/mca/errmgr/hnp/errmgr_hnp.c, line 1573 and following?
I did not look at that, but i will do it right now.
The ability to relocate a failed child process is already in the trunk - it only requires turning "on" with an --enable-recovery flag at runtime if you don't need the checkpoint/restart support. If you do need C/R, you can use that too (just requires some configure flags).
About this, i'm needing C/R support, because what i'm trying to do is to restart a process in another node(as a child of the orted residing there) from a previous checkpoint .I will take a look to the relocation feature that you are mentioning and try to use it.
At the least, the cited code should provide guidance on how to correctly restart procs if you need your own errmgr module for other reasons.
Again thanks Ralph, you have been very helpful.