Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2007-07-27 10:13:57

On 7/27/07 7:58 AM, "Terry D. Dontje" <Terry.Dontje_at_[hidden]> wrote:

> Ralph Castain wrote:
>> WHAT: Proposal to add two new command line options that will allow us to
>> replace the current need to separately launch a persistent daemon to
>> support connect/accept operations
>> WHY: Remove problems of confusing multiple allocations, provide a cleaner
>> method for connect/accept between jobs
>> WHERE: minor changes in orterun and orted, some code in rmgr and each pls
>> to ensure the proper jobid and connect info is passed to each
>> app_context as it is launched
> It is my opinion that we would be better off attacking the issues of
> the persistent daemons described below then creating a new set of
> options to mpirun for process placement. (more comments below on
> the actual proposal).

Non-trivial problems - we haven't figured them out in three years of
occasional effort. It isn't clear that they even -can- be solved when
considering the problem of running in multiple RM-based allocations.

I'll try to provide more detail on the problems when I return from my quick

>> TIMOUT: 8/10/07
>> We currently do not support connect/accept operations in a clean way. Users
>> are required to first start a persistent daemon that operates in a
>> user-named universe. They then must enter the mpirun command for each
>> application in a separate window, providing the universe name on each
>> command line. This is required because (a) mpirun will not run in the
>> background (in fact, at one point in time it would segfault, though I
>> believe it now just hangs), and (b) we require that all applications using
>> connect/accept operate under the same HNP.
>> This is burdensome and appears to be causing problems for users as it
>> requires them to remember to launch that persistent daemon first -
>> otherwise, the applications execute, but never connect. Additionally, we
>> have the problem of confused allocations from the different login sessions.
>> This has caused numerous problems of processes going to incorrect locations,
>> allocations timing out at different times and causing jobs to abort, etc.
>> What I propose here is to eliminate the confusion in a manner that minimizes
>> code complexity. The idea is to utilize our so-painfully-developed multiple
>> app_context capability to have the user launch all the interacting
>> applications with the same mpirun command. This not only eliminates the
>> annoyance factor for users by eliminating the need for multiple steps and
>> login sessions, but also solves the problem of ensuring that all
>> applications are running in the same allocation (so we don't have to worry
>> any more about timeouts in one allocation aborting another job).
>> The proposal is to add two command line options that are associated with a
>> specific app_context (feel free to redefine the name of the option - I don't
>> personally care):
>> 1. --independent-job - indicates that this app_context is to be launched as
>> an independent job. We will assign it a separate jobid, though we will map
>> it as part of the overall command (e.g., if by slot and no other directives
>> provided, it will start mapping where the prior app_context left off)
> I am unclear what does the option --connect really do? The MPI codes
> actually
> have to call MPI_Comm_connect to really connect to a process. Can we
> get away
> with just the above option?

You are right - connect doesn't need to exist. I was thinking it would just
minimize the startup message as I wouldn't bother sharing RTE info across
jobs that weren't "connected". However, for MPI users, this probably would
be confusing, so I would suggest just dropping it. With the routed rml, it
won't have that much impact anyway (I think).

>> 2. --connect x,y,z - only valid when combined with the above option,
>> indicates that this independent job is to be MPI-connected to app_contexts
>> x,y,z (where x,y,z are the number of the app_context, counting from the
>> beginning of the command - you choose if we start from 0 or 1).
>> Alternatively, we can default to connecting to everyone, and then use
>> --disconnect to indicate we -don't- want to be connected.
>> Note that this means the entire allocation for the combined app_contexts
>> must be provided. This helps the RTE tremendously to keep things straight,
>> and ensures that all the app_contexts will be able to complete (or not) in a
>> synchronized fashion.
>> It also allows us to eliminate the persistent daemon and multiple login
>> session requirements for connect/accept. That does not mean we cannot have a
>> persistent daemon to create a virtual machine, assuming we someday want to
>> support that mode of operation. This simply removes the requirement that the
>> user start one just so they can use connect/accept.
>> Comments?
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
> _______________________________________________
> devel mailing list
> devel_at_[hidden]