Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2007-07-27 11:08:07


Guess I was unclear, George - I don't know enough about Aurelien's app to
know if it is capable of (or trying to) run as one job, or not.

What has been described on this thread to-date is, in fact, a corner case.
Hence the proposal of another way to possibly address a corner case without
disrupting the normal code operation.

May not be possible, per the other more general thread....

On 7/27/07 8:31 AM, "George Bosilca" <bosilca_at_[hidden]> wrote:

> It's not about the app. It's about the MPI standard. With one mpirun
> you start one MPI application (SPMD or MPMD but still only one). The
> first impact of this, is all processes started with one mpirun
> command will belong to the same MPI_COMM_WORLD.
>
> Our mpirun is in fact equivalent to the mpiexec as defined in the MPI
> standard. Therefore, we cannot change it's behavior, outside the MPI
> 2 standard boundaries.
>
> Moreover, both of the approaches you described will only add corner
> cases, which I rather prefer to limit in number.
>
> george.
>
>
> On Jul 27, 2007, at 8:42 AM, Ralph Castain wrote:
>
>>
>>
>>
>> On 7/26/07 4:22 PM, "Aurelien Bouteiller" <bouteill_at_[hidden]> wrote:
>>
>>>> mpirun -hostfile big_pool -n 10 -host 1,2,3,4 application : -n 2 -
>>>> host
>>>> 99,100 ft_server
>>>
>>> This will not work: this is a way to launch MIMD jobs, that share the
>>> same COMM_WORLD. Not the way to launch two different applications
>>> that
>>> interact trough Accept/Connect.
>>>
>>> Direct consequence on simple NAS benchmarks are:
>>> * if the second command does not use MPI-Init, then the first
>>> application locks forever in MPI-Init
>>> * if both use MPI init, the MPI_Comm_size of the jobs are incorrect.
>>>
>>>
>>> ****
>>> bouteill_at_dancer:~$ ompi-build/debug/bin/mpirun -prefix
>>> /home/bouteill/ompi-build/debug/ -np 4 -host
>>> node01,node02,node03,node04
>>> NPB3.2-MPI/bin/lu.A.4 : -np 1 -host node01 NPB3.2-MPI/bin/mg.A.1
>>>
>>>
>>> NAS Parallel Benchmarks 3.2 -- LU Benchmark
>>>
>>> Warning: program is running on 5 processors
>>> but was compiled for 4
>>> Size: 64x 64x 64
>>> Iterations: 250
>>> Number of processes: 5
>>
>> Okay - of course, I can't possibly have any idea how your application
>> works... ;-)
>>
>> However, it would be trivial to simply add two options to the
>> app_context
>> command line:
>>
>> 1. designates that this app_context is to be launched as a separate
>> job
>>
>> 2. indicates that this app_context is to be "connected" ala connect/
>> accept
>> to the other app_contexts (if you want, we could even take an argument
>> indicating which app_contexts it is to be connected to). Or we
>> could reverse
>> this as indicate we want it to be disconnected - all depends upon what
>> default people want to define.
>>
>> This would solve the problem you describe while still allowing us
>> to avoid
>> allocation confusion. I'll send it out separately as an RFC.
>>
>> Thanks
>> Ralph
>>
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel