On Jul 30, 2008, at 11:12 AM, Mark Borgerding wrote:
> I appreciate the suggestion about running a daemon on each of the
> remote nodes, but wouldn't I kind of be reinventing the wheel there?
> Process management is one of the things I'd like to be able to count
> on ORTE for.
Keep in mind that the daemons here are not for process management --
they're for name service.
> Would the following work to give the parent process an intercomm
> with each child?
>
> parent i.e. my non-mpirun-started process calls MPI_Init then
> MPI_Open_port
> parent spawns mpirun command via system/exec to create the remote
> children . The name from MPI_Open_port is placed in the environment.
> parent calls MPI_Comm_accept (once for each child?)
> all children call MPI_connect to the name
It may be problematic to call system/exec in some environments (e.g.,
if using OpenFabrics networks). Bad Things can happen.
> I think this would give one intercommunicator back to the parent for
> each remote process (not ideal, but I can worry about broadcast data
> later)
> The remote processes can communicate to each other through
> MPI_COMM_WORLD.
>
>
> Actually when I think through the details, much of this is pretty
> similar to the daemon MPI_Publish_name+MPI_Lookup_name approach.
> The main difference being which processes come first.
Instead of having the framework call MPI_Init in your plugin, can you
plugin system/exec "mpirun -np 1 my_parent_app"? And perhaps use a
pipe (or socket or some other IPC) to communicate between the
framework process and my_parent_app? I realize it's a kludgey
workaround, but it looks like we clearly have a bug in the 1.2 series
with singletons in this area...
--
Jeff Squyres
Cisco Systems
|