On Mar 5, 2010, at 2:38 PM, Ralph Castain wrote:
>> CALL SYSTEM("cd " // TRIM(dir) // " ; mpirun -machinefile ./machinefile -np 1 /home01/group/Execute/DLPOLY.X > job.out 2> job.err ; cd - > /dev/null")
> That is guaranteed not to work. The problem is that mpirun sets environmental variables for the original launch. Your system call carries over those envars, causing mpirun to become confused.
You should be able to use MPI_COMM_SPAWN to launch this MPI job. Check the man page for MPI_COMM_SPANW; I believe we have info keys to specify things like what hosts to launch on, etc.
>> Do you think MPI_COMM_SPAWN can help?
> It's the only method supported by the MPI standard. If you need it to block until this new executable completes, you could use a barrier or other MPI method to determine it.
I believe that the user said they wanted to use the same cores as their original MPI job occupies for the new job -- they basically want the old job to block until the new job completes. Keep in mind that OMPI busy-polls waiting for progress, so you might actually get hosed here (two procs competing for time on the same core).
I'm not immediately thinking of a good way to avoid this issue -- perhaps you could kludge something up such that the parent job polls on sleep() and checking to see if a message has arrived from the child (i.e., the last thing the child does before it calls MPI_FINALIZE is to send a message to its parents and then MPI_COMM_DISCONNECT from its parents). If the parent finds that it has a message from the child(ren), it can MPI_COMM_DISCONNECT and continue processing.
Kinda hackey, but it might work...?
For corporate legal information go to: