Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RTE issue I. Support for non-MPI jobs
From: Rolf.Vandevaart_at_[hidden]
Date: 2007-12-05 09:58:24


Ralph H Castain wrote:

>I. Support for non-MPI jobs
>Considerable complexity currently exists in ORTE because of the stipulation
>in our first requirements document that users be able to mpirun non-MPI jobs
>- i.e., that we support such calls as "mpirun -n 100 hostname". This creates
>a situation, however, where the RTE cannot know if the application will call
>MPI_Init (or at least orte_init), which has significant implications to the
>RTE's architecture. For example, during the launch of the application's
>processes, the RTE cannot go into any form of blocking receive while waiting
>for the procs to report a successful startup as this won't occur for
>execution of something like "hostname".
>
>Jeff has noted that support for non-MPI jobs is not something most (all?)
>MPIs currently provide, nor something that users are likely to exploit as
>they can more easily just "qsub hostname" (or the equivalent for that
>environment). While nice for debugging purposes, therefore, it isn't clear
>that supporting non-MPI jobs is worth the increased code complexity and
>fragility.
>
>In addition, the fact that we do not know if a job will call Init limits our
>ability to do collective communications within the RTE, and hence our
>scalability - see the note on that specific subject for further discussion
>on this area.
>
>This would be a "regression" in behavior, though, so the questions for the
>community are:
>
>(a) do we want to retain the feature to run non-MPI jobs with mpirun as-is
>(and accept the tradeoffs, including the one described below in II)?
>
>
Hi Ralph:
 From a user standpoint, a) would be preferable. However, as you point
out, there are issues. Are you saying that we cannot do collectives
(Item III) if we preserve a? Or is it that things will just be more
complex. I guess I am looking for more details about what the tradeoffs
are for preserving a.

Having said that, we would probably be OK with b) if that makes things
better/faster/robuster.

Rolf

>(b) do we provide a flag to mpirun (perhaps adding the distinction that
>"orterun" must be used for non-MPI jobs?) to indicate "this is NOT an MPI
>job" so we can act accordingly?
>
>(c) simply eliminate support for non-MPI jobs?
>
>(d) other suggestions?
>
>Ralph
>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>