Guess I was unclear, George - I don't know enough about Aurelien's app to
know if it is capable of (or trying to) run as one job, or not.
What has been described on this thread to-date is, in fact, a corner case.
Hence the proposal of another way to possibly address a corner case without
disrupting the normal code operation.
May not be possible, per the other more general thread....
On 7/27/07 8:31 AM, "George Bosilca" <bosilca_at_[hidden]> wrote:
> It's not about the app. It's about the MPI standard. With one mpirun
> you start one MPI application (SPMD or MPMD but still only one). The
> first impact of this, is all processes started with one mpirun
> command will belong to the same MPI_COMM_WORLD.
> Our mpirun is in fact equivalent to the mpiexec as defined in the MPI
> standard. Therefore, we cannot change it's behavior, outside the MPI
> 2 standard boundaries.
> Moreover, both of the approaches you described will only add corner
> cases, which I rather prefer to limit in number.
> On Jul 27, 2007, at 8:42 AM, Ralph Castain wrote:
>> On 7/26/07 4:22 PM, "Aurelien Bouteiller" <bouteill_at_[hidden]> wrote:
>>>> mpirun -hostfile big_pool -n 10 -host 1,2,3,4 application : -n 2 -
>>>> 99,100 ft_server
>>> This will not work: this is a way to launch MIMD jobs, that share the
>>> same COMM_WORLD. Not the way to launch two different applications
>>> interact trough Accept/Connect.
>>> Direct consequence on simple NAS benchmarks are:
>>> * if the second command does not use MPI-Init, then the first
>>> application locks forever in MPI-Init
>>> * if both use MPI init, the MPI_Comm_size of the jobs are incorrect.
>>> bouteill_at_dancer:~$ ompi-build/debug/bin/mpirun -prefix
>>> /home/bouteill/ompi-build/debug/ -np 4 -host
>>> NPB3.2-MPI/bin/lu.A.4 : -np 1 -host node01 NPB3.2-MPI/bin/mg.A.1
>>> NAS Parallel Benchmarks 3.2 -- LU Benchmark
>>> Warning: program is running on 5 processors
>>> but was compiled for 4
>>> Size: 64x 64x 64
>>> Iterations: 250
>>> Number of processes: 5
>> Okay - of course, I can't possibly have any idea how your application
>> works... ;-)
>> However, it would be trivial to simply add two options to the
>> command line:
>> 1. designates that this app_context is to be launched as a separate
>> 2. indicates that this app_context is to be "connected" ala connect/
>> to the other app_contexts (if you want, we could even take an argument
>> indicating which app_contexts it is to be connected to). Or we
>> could reverse
>> this as indicate we want it to be disconnected - all depends upon what
>> default people want to define.
>> This would solve the problem you describe while still allowing us
>> to avoid
>> allocation confusion. I'll send it out separately as an RFC.
>>> devel mailing list
>> devel mailing list
> devel mailing list