Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] XML request
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-09-10 21:44:23


I filed ticket #2019 pointing to this email thread in case someone
ever wants to implement it.

FWIW: I don't think it matters much whether it's implemented as part
of mpirun or a new executable; I suspect that whatever implementation
is easiest will be fine.

On Sep 10, 2009, at 9:28 PM, Greg Watson wrote:

> Hi Jeff,
>
> I think that sums up the situation nicely. For item #2, I wonder if it
> would be better to still use "ssh <host> mpirun ...", but have mpirun
> fork itself "under the covers"? Not having an extra executable in your
> distribution would probably make long term maintenance easier.
>
> If Ralph can do anything in the 1.3/1.4 timeframe to sort out the few
> remaining issues, it would be appreciated.
>
> Regards,
> Greg
>
> On Sep 10, 2009, at 3:19 PM, Jeff Squyres wrote:
>
> > Greg and I chatted on the phone about this. I now understand much
> > better about what he is trying to do (short version: Eclipse is
> > running on one machine, it is opening an ssh session to a remote
> > machine and launching mpirun on that remote machine).
> >
> > Results of the phone conversation (for the web archives):
> >
> > - In the short term, there's a few remaining issues to be figured
> > out. Ralph (who is now full-time at Cisco) may or may not have time
> > to fix these in the near team. We (Open MPI) would happily review
> > patches from others in this area if a solution is required before
> > Ralph can get to it.
> >
> > - In the long term, we came up with a "thinking outside the box"
> > solution that seems to be *much* better (think 1.5 and beyond).
> > I'll describe the scheme, but at the same time, I'll indicate that
> > Cisco likely does not have time in the foreseeable future to
> > implement it. Again, we would be happy to provide guidance to
> > anyone who would want to implement it (e.g., IBM) and/or review
> > patches.
> >
> > -----
> >
> > 1. Currently, the Eclipse plugin is effectively executing "ssh
> > <otherhost> mpirun ...". This has several advantages:
> > - Use whatever the native OMPI is on <otherhost>
> > - No need for binary compatibility (i.e., version match of Eclipse
> > plugin and remote OMPI installation)
> >
> > 2. The proposal is to change this to "ssh <otherhost> mpirun-
> > proxy ..." where mpirun-proxy is a new executable that does the
> > following:
> > - fork/exec the real mpirun, making pipes to mpirun's stdin/
> stdout/
> > stderr
> > - tell mpirun to not display any IOF output from MPI processes
> > - tell mpirun to not display any show_help messages
> > - register to receive ORTE "events" (more below) via the ORTE comm
> > library
> > - register to receive IOF from all the MPI processes via the ORTE
> > comm library
> > - register to receive show_help messages from MPI processes via
> > the ORTE comm library
> > - upon receipt of specific events (e.g., determination of host/
> > node/process maps), output this data encased in a specific XML
> > schema (e.g., a specific set of XML tags to encase each data item in
> > the nodemap) to ssh's stdout
> > - read output from mpirun's stdout/stderr, output it on ssh's
> > stdout, encased in <stdout> / <stderr> (etc.)
> > - read IOF from MPI processes and output them to ssh's stdout,
> > encased in appropriate XML tagging
> > - read show_help messages from MPI processes and output them to
> > ssh's stdout, encased in appropriate XML tagging
> >
> > --> Note that some of the above functionality already exists; its
> > would just need to be marshaled together and used in some new
> > logic. Other parts of the functionality do not exist and would need
> > to be written (e.g., redirecting show_help messages to something
> > other than the HNP).
> >
> > 3. Once #2 is done, remove all the XML processing from mpirun,
> > libopen-rte, libmpi, and all OMPI plugins (since it's now all in
> > mpirun-proxy).
> >
> > -----
> >
> > This functionality would accomplish the following:
> >
> > - The code is distributed in Open MPI -- not Eclipse or an Eclipse
> > plugin -- there's no additional compilation or linking step for the
> > Eclipse plugin to talk to OMPI.
> >
> > - The Eclipse plugin, which already checks the output from
> > ompi_info, can know when to use this new functionality (ssh mpirun-
> > proxy instead of mpirun).
> >
> > - All the OMPI XML parsing can be centralized to the mpirun-proxy
> > executable. This is a *huge* improvement over having XML sprinkled
> > all over the OMPI code base, as it is now. Additionally, with this
> > method, *all* OMPI output will be encased in XML before it is sent
> > to the Eclipse plugin (via ssh's stdout). Today, we have "XML-lite"
> > functionality in that "most" of OMPI's output is XML-ified, but
> > there's oodles and oodles of corner cases where output is *not* XML-
> > ified. The above proposal seems to be the best idea so far on how
> > to address this issue in a holistic way (rather than adding a bunch
> > more band-aids every time we find another output that isn't XML-
> > ified).
> >
> >
> >
> >
> >
> > On Sep 10, 2009, at 9:23 AM, Greg Watson wrote:
> >
> >> The most appealing thing about the XML option is that it just works
> >> "out of the box." Using a library API invariably requires compiling
> >> an
> >> agent or distributing pre-compiled binaries with all the associated
> >> complications. We tried that in the dim past and it was pretty
> >> unworkable. The other problem was that the API headers were not
> >> installed by default, so users were forced to install local
> copies of
> >> OMPI with development headers enabled. It was not a great end-user
> >> experience.
> >>
> >> Greg
> >>
> >> On Sep 10, 2009, at 8:45 AM, Jeff Squyres wrote:
> >>
> >> > Thinking about this a little more ...
> >> >
> >> > This all seems like Open MPI-specific functionality for Eclipse.
> >> If
> >> > that's the case, don't we have an ORTE tools communication
> library
> >> > that could be used? IIRC, it pretty much does exactly what you
> >> want
> >> > and would be far less clumsy than trying to jury-rig sending XML
> >> > down files/fd's/whatever. I have dim recollections of the ORTE
> >> > tools communication library API returning the data that you have
> >> > asked for in data structures -- no parsing of XML at all (and,
> more
> >> > importantly to us, no need to add all kinds of special code paths
> >> > for wrapping our output in XML).
> >> >
> >> > If I'm right (and that's a big "if"!), is there a reason that
> this
> >> > library is not attractive to you?
> >> >
> >> >
> >> >
> >> >
> >> > On Sep 10, 2009, at 8:04 AM, Jeff Squyres wrote:
> >> >
> >> >> On Sep 9, 2009, at 12:17 PM, Ralph Castain wrote:
> >> >>
> >> >>> Hmmm....I never considered the possibility of output-filename
> >> being
> >> >>> used that way. Interesting idea!
> >> >>>
> >> >>
> >> >> That feels way weird to me -- for example, how do you know that
> >> >> you're actually outputting to a tty?
> >> >>
> >> >> FWIW: +1 on the idea of writing to numbered fd's passed on the
> >> >> command line. It just "feels" like a more POSIX-ish way of
> doing
> >> >> things...? I guess I'm surprised that that would be difficult
> to
> >> >> do from Java.
> >> >>
> >> >> --
> >> >> Jeff Squyres
> >> >> jsquyres_at_[hidden]
> >> >>
> >> >
> >> >
> >> > --
> >> > Jeff Squyres
> >> > jsquyres_at_[hidden]
> >> >
> >> > _______________________________________________
> >> > devel mailing list
> >> > devel_at_[hidden]
> >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Jeff Squyres
jsquyres_at_[hidden]