Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] dropping a pls module into an Open MPI build
From: Dean Dauger, Ph. D. (d_at_[hidden])
Date: 2008-01-21 14:57:31

> Which source checkout did you use? Note that the pls structures have
> likely changed between the OMPI SVN trunk and the v1.2 branch.

> Hmm -- are you saying that you tried compiling the Apple copy of the
> rsh pls and/or the OMPI SVN v1.2.3 rsh pls and neither of them worked?

Yes, I tried both of those and they gave the same bus error. If I'm
reading the stack dump right:

[Rotarran-X-5:04475] Failing at address: 0x0
[ 1] [0xbffff828, 0x00000000] (-P-)
[ 2] (orterun + 0x457) [0xbffff8b8, 0x00001d07]

it's orterun() calling a null pointer.

> I don't rightly know why that wouldn't work -- is there a way to know
> with what compiler flags Apple built Open MPI?

I'm not sure, but I think these are the configure flags they use:

--disable-mpi-f77 --without-cs-fs -enable-mca-no-build=ras-slurm,pls-
slurm,gpr-null,sds-pipe,sds-slurm,pml-cm --mandir=/usr/share/man --
sysconfdir=/usr/share NM="nm -p"

> Can you step through
> mpirun with a debugger to see where it dies? I suspect it may not
> have any debugging symbols, so you might not, but at least you might
> be able to see which pls rsh functions are invoked...? (and more
> importantly, if something is invoked "wrong" in the pls rsh)

Adding some printf's into the pls rsh shows the _init and _open
routines are successfully executing and exiting. I'll see if I can
figure out what part of orterun() is "orterun + 0x457". I have not
attempted to replace orterun/mpirun/etc., only the pls pieces.

Thank you,