Camille and myself are working also on improving the restart ability
of orte2. We are focusing on restarting individual processes (while
Josh needs to restart the entire job). However I guess most of the
functionalities are similar. Could we join your discussions on point 3 ?
Le 27 févr. 08 à 21:47, Ralph Castain a écrit :
> Hi folks
> Okay, the ORTE merge appears to have gone well and is now complete -
> you are
> free to use the trunk.
> A few caveats:
> 1. obviously, you will need to autogen/configure once you update. I
> -strongly- recommend you rm -rf your install directory first as you
> definitely be hit with stale libraries from this commit
> 2. this is a "drop" from the ORTE devel effort. As such, it is -not-
> complete. There are several known issues, particularly with
> comm_spawn and
> singleton comm_spawn in certain environments and scenarios. I have a
> already done and ready to be applied for the comm_spawn problems,
> but I want
> to test it some more in the morning before committing it to the
> trunk - and
> I didn't want to delay this merge any longer.
> 3. we know that checkpoint/restart is currently broken. Josh and I
> discussed a couple of options for repairing it, and he will look at
> it as
> soon as he has a chance. It isn't a big problem - just need to
> decide which
> option he would prefer to pursue.
> The remaining ORTE scalability work should be moving into the trunk
> over the
> next few weeks (I will be on vacation 3/7-14, so it will likely take
> March). We do not anticipate any API changes or framework adds/
> deletes the
> rest of the way - there will be a few new components added to existing
> frameworks, some revamp of the logic in a few places, etc.
> I will try to cover all the changes in one or two notes over the
> next few
> days to avoid carpal tunnel. Please feel free to ask questions and
> I'll do
> my best to provide answers.
> Thanks again for the cooperation tonight...
> devel mailing list