Greg and I chatted on the phone about this. I now understand much
better about what he is trying to do (short version: Eclipse is
running on one machine, it is opening an ssh session to a remote
machine and launching mpirun on that remote machine).
Results of the phone conversation (for the web archives):
- In the short term, there's a few remaining issues to be figured
out. Ralph (who is now full-time at Cisco) may or may not have time
to fix these in the near team. We (Open MPI) would happily review
patches from others in this area if a solution is required before
Ralph can get to it.
- In the long term, we came up with a "thinking outside the box"
solution that seems to be *much* better (think 1.5 and beyond). I'll
describe the scheme, but at the same time, I'll indicate that Cisco
likely does not have time in the foreseeable future to implement it.
Again, we would be happy to provide guidance to anyone who would want
to implement it (e.g., IBM) and/or review patches.
1. Currently, the Eclipse plugin is effectively executing "ssh
<otherhost> mpirun ...". This has several advantages:
- Use whatever the native OMPI is on <otherhost>
- No need for binary compatibility (i.e., version match of Eclipse
plugin and remote OMPI installation)
2. The proposal is to change this to "ssh <otherhost> mpirun-
proxy ..." where mpirun-proxy is a new executable that does the
- fork/exec the real mpirun, making pipes to mpirun's stdin/stdout/
- tell mpirun to not display any IOF output from MPI processes
- tell mpirun to not display any show_help messages
- register to receive ORTE "events" (more below) via the ORTE comm
- register to receive IOF from all the MPI processes via the ORTE
- register to receive show_help messages from MPI processes via
the ORTE comm library
- upon receipt of specific events (e.g., determination of host/
node/process maps), output this data encased in a specific XML schema
(e.g., a specific set of XML tags to encase each data item in the
nodemap) to ssh's stdout
- read output from mpirun's stdout/stderr, output it on ssh's
stdout, encased in <stdout> / <stderr> (etc.)
- read IOF from MPI processes and output them to ssh's stdout,
encased in appropriate XML tagging
- read show_help messages from MPI processes and output them to
ssh's stdout, encased in appropriate XML tagging
--> Note that some of the above functionality already exists; its
would just need to be marshaled together and used in some new logic.
Other parts of the functionality do not exist and would need to be
written (e.g., redirecting show_help messages to something other than
3. Once #2 is done, remove all the XML processing from mpirun, libopen-
rte, libmpi, and all OMPI plugins (since it's now all in mpirun-proxy).
This functionality would accomplish the following:
- The code is distributed in Open MPI -- not Eclipse or an Eclipse
plugin -- there's no additional compilation or linking step for the
Eclipse plugin to talk to OMPI.
- The Eclipse plugin, which already checks the output from ompi_info,
can know when to use this new functionality (ssh mpirun-proxy instead
- All the OMPI XML parsing can be centralized to the mpirun-proxy
executable. This is a *huge* improvement over having XML sprinkled
all over the OMPI code base, as it is now. Additionally, with this
method, *all* OMPI output will be encased in XML before it is sent to
the Eclipse plugin (via ssh's stdout). Today, we have "XML-lite"
functionality in that "most" of OMPI's output is XML-ified, but
there's oodles and oodles of corner cases where output is *not* XML-
ified. The above proposal seems to be the best idea so far on how to
address this issue in a holistic way (rather than adding a bunch more
band-aids every time we find another output that isn't XML-ified).
On Sep 10, 2009, at 9:23 AM, Greg Watson wrote:
> The most appealing thing about the XML option is that it just works
> "out of the box." Using a library API invariably requires compiling an
> agent or distributing pre-compiled binaries with all the associated
> complications. We tried that in the dim past and it was pretty
> unworkable. The other problem was that the API headers were not
> installed by default, so users were forced to install local copies of
> OMPI with development headers enabled. It was not a great end-user
> On Sep 10, 2009, at 8:45 AM, Jeff Squyres wrote:
> > Thinking about this a little more ...
> > This all seems like Open MPI-specific functionality for Eclipse. If
> > that's the case, don't we have an ORTE tools communication library
> > that could be used? IIRC, it pretty much does exactly what you want
> > and would be far less clumsy than trying to jury-rig sending XML
> > down files/fd's/whatever. I have dim recollections of the ORTE
> > tools communication library API returning the data that you have
> > asked for in data structures -- no parsing of XML at all (and, more
> > importantly to us, no need to add all kinds of special code paths
> > for wrapping our output in XML).
> > If I'm right (and that's a big "if"!), is there a reason that this
> > library is not attractive to you?
> > On Sep 10, 2009, at 8:04 AM, Jeff Squyres wrote:
> >> On Sep 9, 2009, at 12:17 PM, Ralph Castain wrote:
> >>> Hmmm....I never considered the possibility of output-filename
> >>> used that way. Interesting idea!
> >> That feels way weird to me -- for example, how do you know that
> >> you're actually outputting to a tty?
> >> FWIW: +1 on the idea of writing to numbered fd's passed on the
> >> command line. It just "feels" like a more POSIX-ish way of doing
> >> things...? I guess I'm surprised that that would be difficult to
> >> do from Java.
> >> --
> >> Jeff Squyres
> >> jsquyres_at_[hidden]
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> devel mailing list