On Jan 18, 2008, at 2:17 PM, Dean Dauger, Ph. D. wrote:
> I'm developing an mca_pls module, intending to drop it into a
> preexisting Open MPI build (in its lib/openmpi directory) and have
> orterun pick it up, but orterun kept crashing on me even though it
> correctly calls my module. To help isolate the issue I separately
> recompiled the mca_pls_rsh module from a given Open MPI source
> checkout and dropping that didn't work either. Any pointers?
Which source checkout did you use? Note that the pls structures have
likely changed between the OMPI SVN trunk and the v1.2 branch. So if
you didn't use a checkout from the v1.2 branch, I would expect Random
Bad Things (RBT's) to occur.
> pingpong was compiled with the existing Open MPI, and it runs with
> the built-in rsh module, but not when I replace the pls_rsh module
> with a recompiled one. When I add printf's in the pls_rsh module in
> its _open and _init, I can show each of its subroutines return
> without problem, but _launch is not yet called. I'm running Mac OS X
> 10.5.1, which ships with Open MPI at /usr, on a MacBook Pro with an
> Intel Core Duo. ("Rotarran X.5" is the name of the computer.) I
> first attempted the 1.3.0 source code via svn, then went back to the
> 1.2.3 source code from Open MPI, but both gave the above bus error.
> Then I went to Apple's copy of Open MPI 1.2.3 at opensource.apple.com
> guessing Apple changed things, but that still doesn't work. I've
> tried their take on ./configure options too to no avail. Other than
> debugging orterun, what else can I try?
Hmm -- are you saying that you tried compiling the Apple copy of the
rsh pls and/or the OMPI SVN v1.2.3 rsh pls and neither of them worked?
I don't rightly know why that wouldn't work -- is there a way to know
with what compiler flags Apple built Open MPI? Can you step through
mpirun with a debugger to see where it dies? I suspect it may not
have any debugging symbols, so you might not, but at least you might
be able to see which pls rsh functions are invoked...? (and more
importantly, if something is invoked "wrong" in the pls rsh)