It's not about the app. It's about the MPI standard. With one mpirun
you start one MPI application (SPMD or MPMD but still only one). The
first impact of this, is all processes started with one mpirun
command will belong to the same MPI_COMM_WORLD.
Our mpirun is in fact equivalent to the mpiexec as defined in the MPI
standard. Therefore, we cannot change it's behavior, outside the MPI
2 standard boundaries.
Moreover, both of the approaches you described will only add corner
cases, which I rather prefer to limit in number.
On Jul 27, 2007, at 8:42 AM, Ralph Castain wrote:
> On 7/26/07 4:22 PM, "Aurelien Bouteiller" <bouteill_at_[hidden]> wrote:
>>> mpirun -hostfile big_pool -n 10 -host 1,2,3,4 application : -n 2 -
>>> 99,100 ft_server
>> This will not work: this is a way to launch MIMD jobs, that share the
>> same COMM_WORLD. Not the way to launch two different applications
>> interact trough Accept/Connect.
>> Direct consequence on simple NAS benchmarks are:
>> * if the second command does not use MPI-Init, then the first
>> application locks forever in MPI-Init
>> * if both use MPI init, the MPI_Comm_size of the jobs are incorrect.
>> bouteill_at_dancer:~$ ompi-build/debug/bin/mpirun -prefix
>> /home/bouteill/ompi-build/debug/ -np 4 -host
>> NPB3.2-MPI/bin/lu.A.4 : -np 1 -host node01 NPB3.2-MPI/bin/mg.A.1
>> NAS Parallel Benchmarks 3.2 -- LU Benchmark
>> Warning: program is running on 5 processors
>> but was compiled for 4
>> Size: 64x 64x 64
>> Iterations: 250
>> Number of processes: 5
> Okay - of course, I can't possibly have any idea how your application
> works... ;-)
> However, it would be trivial to simply add two options to the
> command line:
> 1. designates that this app_context is to be launched as a separate
> 2. indicates that this app_context is to be "connected" ala connect/
> to the other app_contexts (if you want, we could even take an argument
> indicating which app_contexts it is to be connected to). Or we
> could reverse
> this as indicate we want it to be disconnected - all depends upon what
> default people want to define.
> This would solve the problem you describe while still allowing us
> to avoid
> allocation confusion. I'll send it out separately as an RFC.
>> devel mailing list
> devel mailing list