Sorry for the delay; you wrote while many of us were on vacation and we're just now starting to catch up on past mails...
I'm not entirely sure what you're trying to do. It sounds like you're trying to replace one process with another. That's quite complicated; there will be a lot of changes required in the code base to do this.
- you'll need to notify the ORTE subsystem of the process change
- this notification will likely need to span multiple processes
- all MPI processes will need to quiesce their communications, disconnect, and reconnect
- ...and probably other things
That being said, you might be able to leverage some of the work that's been done with checkpoint/restart/migration. It's not entirely the same thing that you're doing, but it's at least similar (quiesce networks, [pretend to] move a process from location A to location B, etc.).
On Dec 28, 2010, at 7:03 AM, Hugo Meyer wrote:
> Hello to all.
> I'm new in the forum, at least is the first time i write.
> I'm working with open mpi and I would do a little experiment, i will try to pass one process by another process.
> For example, assuming that there are 2 processes that are communicating say rank 1 and 2. And there is a process of rank 3, I would like the rank 3 (it could be assumed that this node is marked down at the initial hostfile) took the place of rank 2, and rank 1 still think that he is communicating with rank 2 when in fact is communicating with the rank 3.
> I guess I'll have to modify tables as orte_job_map_t and orte_proc_t, but I wanted to know if someone already has experience doing something similar, and can guide me at least.
> The communication between processes, in principle, would be irrelevant, so i will not need to use checkpoints / restarts for now.
> Hugo Meyer
> devel mailing list
For corporate legal information go to: