Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] dropping a pls module into an Open MPI build
From: Dean Dauger, Ph. D. (d_at_[hidden])
Date: 2008-01-18 14:17:16


I'm developing an mca_pls module, intending to drop it into a
preexisting Open MPI build (in its lib/openmpi directory) and have
orterun pick it up, but orterun kept crashing on me even though it
correctly calls my module. To help isolate the issue I separately
recompiled the mca_pls_rsh module from a given Open MPI source
checkout and dropping that didn't work either. Any pointers?

To give an idea of what's going on here's an example attempt to run
on two local processors:

dauger$ orterun -mca pls rsh -mca pls_base_verbose 10 --debug-devel --
np 2 --host localhost "/Users/dauger/Documents/ompi-trunk/pingpong"
[Rotarran-X-5.local:04475] connect_uni: connection not allowed
[Rotarran-X-5.local:04475] mca: base: components_open: Looking for
pls components
[Rotarran-X-5.local:04475] mca: base: components_open: distilling pls
[Rotarran-X-5.local:04475] mca: base: components_open: including pls
[Rotarran-X-5.local:04475] mca: base: components_open: rsh -->
[Rotarran-X-5.local:04475] mca: base: components_open: opening pls
[Rotarran-X-5.local:04475] mca: base: components_open: found loaded
component rsh
[Rotarran-X-5.local:04475] mca: base: components_open: component rsh
open function successful
[Rotarran-X-5.local:04475] orte:base:select: querying component rsh
[Rotarran-X-5.local:04475] [0,0,0] setting up session dir with
[Rotarran-X-5.local:04475] universe default-universe-4475
[Rotarran-X-5.local:04475] user dauger
[Rotarran-X-5.local:04475] host Rotarran-X-5.local
[Rotarran-X-5.local:04475] jobid 0
[Rotarran-X-5.local:04475] procid 0
[Rotarran-X-5.local:04475] procdir: /var/folders/oE/oENz6Cd
[Rotarran-X-5.local:04475] jobdir: /var/folders/oE/oENz6Cd
[Rotarran-X-5.local:04475] unidir: /var/folders/oE/oENz6Cd
[Rotarran-X-5.local:04475] top: openmpi-sessions-dauger_at_Rotarran-
[Rotarran-X-5.local:04475] tmp: /var/folders/oE/oENz6Cd+FTCWQbRGkntLLU
[Rotarran-X-5.local:04475] [0,0,0] contact_file /var/folders/oE/
[Rotarran-X-5.local:04475] [0,0,0] wrote setup file
[Rotarran-X-5:04475] *** Process received signal ***
[Rotarran-X-5:04475] Signal: Bus error (10)
[Rotarran-X-5:04475] Signal code: (2)
[Rotarran-X-5:04475] Failing at address: 0x0
[ 1] [0xbffff828, 0x00000000] (-P-)
[ 2] (orterun + 0x457) [0xbffff8b8, 0x00001d07]
[ 3] (main + 0x18) [0xbffff8d8, 0x000018ae]
[ 4] (start + 0x36) [0xbffff8fc, 0x0000186a]
[ 5] [0x00000000, 0x0000000d] (FP-)
[Rotarran-X-5:04475] *** End of error message ***
Bus error

pingpong was compiled with the existing Open MPI, and it runs with
the built-in rsh module, but not when I replace the pls_rsh module
with a recompiled one. When I add printf's in the pls_rsh module in
its _open and _init, I can show each of its subroutines return
without problem, but _launch is not yet called. I'm running Mac OS X
10.5.1, which ships with Open MPI at /usr, on a MacBook Pro with an
Intel Core Duo. ("Rotarran X.5" is the name of the computer.) I
first attempted the 1.3.0 source code via svn, then went back to the
1.2.3 source code from Open MPI, but both gave the above bus error.
Then I went to Apple's copy of Open MPI 1.2.3 at
guessing Apple changed things, but that still doesn't work. I've
tried their take on ./configure options too to no avail. Other than
debugging orterun, what else can I try?

Thanks in advance,