Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2007-04-24 08:24:38


Hi John

I'm afraid that the straightforward approach you're trying isn't going to
work with Open MPI in its current implementation. I had plans for supporting
this kind of operation, but....not happening. And as you discovered, you
cannot run mpiexec/mpirun in the background, and the "do-not-wait" option
doesn't work (may even be turned "off" by now, depending which version you
are using).

Your best bet would be to put a call in your first executable to "spawn" the
second executable. You don't need to do this via MPI - you can do it
directly from a non-MPI program by calling the appropriate RTE function.
Several OpenRTE (the RTE underneath Open MPI) users do this regularly,
myself included.

I don't know what version you are using, but assuming it is 1.2 or the
"trunk", you will find an example of this in a test program in
orte/test/system/orte_spawn.c. I can provide advice/details on how to make
this work, if needed (probably best done off-list, or use the OpenRTE
mailing lists - see http://www.open-rte.org).

Ralph

On 4/23/07 11:18 PM, "John Borchardt" <john.borchardt_at_[hidden]> wrote:

> Greetings,
>
> I was hoping someone could help me with the following situation. I have a
> program which has no MPI support that I'd like to run "in parallel" by running
> a portion of my total task on N CPUs of a PBS/Maui/Open-MPI cluster. (The
> algorithm is such that there is no real need for MPI, I am just as well-off
> running N processes on N CPUs as I would be adding MPI support to my program
> and then running on N CPUs.)
>
> So it's easy enough to set up a Perl script to submit N jobs to the queue to
> run on N nodes. But, my cluster has two CPUs per node, and I am not
> RAM-limited, so I'd like to run two serial jobs per node, one on each node
> CPU. From what my admin tells me, I must use the mpiexec command to run my
> program so that the scheduler knows to run my program on the nodes which it
> has assigned to me.
>
> In my PBS script (this is one of N/2 similar scripts),
>
> #!/bin/bash
> #PBS -l nodes=1:ppn=2
> #PBS -l walltime=1:00:00:00
> mpiexec -pernode program-executable<inputfile1>outputfile1
> mpiexec -pernode program-executable<inputfile2>outputfile2
>
> does not have the desired effect. It appears that (1) the second process
> waits for the first to finish, and (2) MPI or the scheduler (I can't tell
> which) tries to re-start the program a few times (you can see this in the
> output files). Adding an ampersand to the first mpiexec line appears to cause
> mpiexec to crash and the job does not run at all. Using:
>
> mpiexec -np 1 program-executable<inputfile>outputfile
>
> avoids the strange re-start problem I mentioned above, but of course does not
> use both CPUs on a node.
>
>
> Maybe I am making a simple mistake, but I am quite new to cluster computing...
> Any help you can offer is greatly appreciated!
>
>
> Thanks,
>
> --John Borchardt
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users