Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2007-07-27 10:13:57


On 7/27/07 7:58 AM, "Terry D. Dontje" <Terry.Dontje_at_[hidden]> wrote:

> Ralph Castain wrote:
>
>> WHAT: Proposal to add two new command line options that will allow us to
>> replace the current need to separately launch a persistent daemon to
>> support connect/accept operations
>>
>> WHY: Remove problems of confusing multiple allocations, provide a cleaner
>> method for connect/accept between jobs
>>
>> WHERE: minor changes in orterun and orted, some code in rmgr and each pls
>> to ensure the proper jobid and connect info is passed to each
>> app_context as it is launched
>>
>>
>>
> It is my opinion that we would be better off attacking the issues of
> the persistent daemons described below then creating a new set of
> options to mpirun for process placement. (more comments below on
> the actual proposal).

Non-trivial problems - we haven't figured them out in three years of
occasional effort. It isn't clear that they even -can- be solved when
considering the problem of running in multiple RM-based allocations.

I'll try to provide more detail on the problems when I return from my quick
trip...

>
>> TIMOUT: 8/10/07
>>
>> We currently do not support connect/accept operations in a clean way. Users
>> are required to first start a persistent daemon that operates in a
>> user-named universe. They then must enter the mpirun command for each
>> application in a separate window, providing the universe name on each
>> command line. This is required because (a) mpirun will not run in the
>> background (in fact, at one point in time it would segfault, though I
>> believe it now just hangs), and (b) we require that all applications using
>> connect/accept operate under the same HNP.
>>
>> This is burdensome and appears to be causing problems for users as it
>> requires them to remember to launch that persistent daemon first -
>> otherwise, the applications execute, but never connect. Additionally, we
>> have the problem of confused allocations from the different login sessions.
>> This has caused numerous problems of processes going to incorrect locations,
>> allocations timing out at different times and causing jobs to abort, etc.
>>
>> What I propose here is to eliminate the confusion in a manner that minimizes
>> code complexity. The idea is to utilize our so-painfully-developed multiple
>> app_context capability to have the user launch all the interacting
>> applications with the same mpirun command. This not only eliminates the
>> annoyance factor for users by eliminating the need for multiple steps and
>> login sessions, but also solves the problem of ensuring that all
>> applications are running in the same allocation (so we don't have to worry
>> any more about timeouts in one allocation aborting another job).
>>
>> The proposal is to add two command line options that are associated with a
>> specific app_context (feel free to redefine the name of the option - I don't
>> personally care):
>>
>> 1. --independent-job - indicates that this app_context is to be launched as
>> an independent job. We will assign it a separate jobid, though we will map
>> it as part of the overall command (e.g., if by slot and no other directives
>> provided, it will start mapping where the prior app_context left off)
>>
>>
>>
> I am unclear what does the option --connect really do? The MPI codes
> actually
> have to call MPI_Comm_connect to really connect to a process. Can we
> get away
> with just the above option?

You are right - connect doesn't need to exist. I was thinking it would just
minimize the startup message as I wouldn't bother sharing RTE info across
jobs that weren't "connected". However, for MPI users, this probably would
be confusing, so I would suggest just dropping it. With the routed rml, it
won't have that much impact anyway (I think).

>
>> 2. --connect x,y,z - only valid when combined with the above option,
>> indicates that this independent job is to be MPI-connected to app_contexts
>> x,y,z (where x,y,z are the number of the app_context, counting from the
>> beginning of the command - you choose if we start from 0 or 1).
>> Alternatively, we can default to connecting to everyone, and then use
>> --disconnect to indicate we -don't- want to be connected.
>>
>> Note that this means the entire allocation for the combined app_contexts
>> must be provided. This helps the RTE tremendously to keep things straight,
>> and ensures that all the app_contexts will be able to complete (or not) in a
>> synchronized fashion.
>>
>> It also allows us to eliminate the persistent daemon and multiple login
>> session requirements for connect/accept. That does not mean we cannot have a
>> persistent daemon to create a virtual machine, assuming we someday want to
>> support that mode of operation. This simply removes the requirement that the
>> user start one just so they can use connect/accept.
>>
>> Comments?
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel