Trying to replicate this, but I can't. I'm using the latest 1.6 tarball, not 1.6.5, so it is possible something was fixed - though I believe we have committed very few changes as that series is about to drop to "deprecated".

First thing I encountered:

configure: WARNING: unrecognized options: --disable-maintainer-mode, --enable-ltdl-convenience

So I removed those - no idea what they even do - but retained the rest of your configure options. I then used your cmd line, substituting "hostname" for "foo", and everything ran just fine on an ssh-based system. Here's my system info:

Linux bend001 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)



On Nov 15, 2013, at 7:24 AM, Sylvestre Ledru <sylvestre@debian.org> wrote:

Hello,

On 02/10/2013 19:34, Jeff Squyres (jsquyres) wrote:
On Sep 30, 2013, at 11:05 AM, Sylvestre Ledru <sylvestre@debian.org> wrote:

Here are the options list:
configure: running /bin/bash './configure'  CFLAGS="-DNDEBUG -g -O2
-Wformat -Werror=format-security -finline-functions -fno-strict-aliasing
-pthread" CPPFLAGS=" -I/usr//include   -I/usr/include/infiniband
-I/usr/include/infiniband" FFLAGS="-g -O2" LDFLAGS="  -L/usr//lib"
--enable-shared --disable-static  --prefix=/usr --with-mpi=open_mpi
--disable-aio --cache-file=/dev/null --srcdir=. --disable-option-checking
Hmm -- I'm confused here; it's not possible that you're getting an assertion failure with this configure line, for two reasons:

1. The assert() in question will only be compiled in if you --enable-debug on the configure command line.
2. You supplied -DNDEBUG in CFLAGS, which means you've disabled all assert()s

Can you verify that this is the correct configure line that you used to generate that error?  Or is something else going on?

So, I tried with the arguments you sent me in private.
$ ./configure --prefix=/home/sylvestre/bogus2 --disable-maintainer-mode
--disable-dependency-tracking --with-threads=posix
--enable-opal-multi-threads --disable-silent-rules --enable-debug
--with-devel-headers --with-slurm --with-sge --enable-heterogeneous
--disable-vt --enable-mpirun-prefix-by-default --enable-mpi-f77
--enable-mpi-f90 --enable-ltdl-convenience

I am getting something more interesting than a freeze (even if it does
not mean much to me):
./mpirun -mca plm_base_verbose 5 -mca ras_base_verbose 5 -mca
rmaps_base_verbose 5 -mca ess_base_verbose  5 -c 4 foo
[merulo:32531] mca:base:select:(  ess) Querying component [env]
[merulo:32531] mca:base:select:(  ess) Skipping component [env]. Query
failed to return a module
[merulo:32531] mca:base:select:(  ess) Querying component [hnp]
[merulo:32531] mca:base:select:(  ess) Query of component [hnp] set
priority to 100
[merulo:32531] mca:base:select:(  ess) Querying component [singleton]
[merulo:32531] mca:base:select:(  ess) Skipping component [singleton].
Query failed to return a module
[merulo:32531] mca:base:select:(  ess) Querying component [slave]
[merulo:32531] mca:base:select:(  ess) Query of component [slave] set
priority to 0
[merulo:32531] mca:base:select:(  ess) Querying component [slurm]
[merulo:32531] mca:base:select:(  ess) Skipping component [slurm]. Query
failed to return a module
[merulo:32531] mca:base:select:(  ess) Querying component [slurmd]
[merulo:32531] mca:base:select:(  ess) Skipping component [slurmd].
Query failed to return a module
[merulo:32531] mca:base:select:(  ess) Querying component [tm]
[merulo:32531] mca:base:select:(  ess) Skipping component [tm]. Query
failed to return a module
[merulo:32531] mca:base:select:(  ess) Querying component [tool]
[merulo:32531] mca:base:select:(  ess) Skipping component [tool]. Query
failed to return a module
[merulo:32531] mca:base:select:(  ess) Selected component [hnp]
[merulo:32531] mca:base:select:(  plm) Querying component [rsh]
[merulo:32531] [[INVALID],INVALID] plm:base:rsh_lookup on agent ssh :
rsh path NULL
[merulo:32531] *** Process received signal ***
[merulo:32531] Signal: Segmentation fault (11)
[merulo:32531] Signal code: Invalid permissions (2)
[merulo:32531] Failing at address: (nil)
[merulo:32531] [ 0]
linux-gate.so.1(__kernel_sigtramp+0x7fffffffff886860) [0xa000000000040800]
[merulo:32531] [ 1]
/home/sylvestre/bogus2/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_component_query+0xae3c0)
[0x2000000000867f40]
[merulo:32531] [ 2]
/home/sylvestre/bogus2/lib/libopen-rte.so.4(mca_base_select-0x5dc110)
[0x20000000001ddea0]
[merulo:32531] [ 3]
/home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_plm_base_select-0x680cd0)
[0x20000000001392f0]
[merulo:32531] [ 4]
/home/sylvestre/bogus2/lib/openmpi/mca_ess_hnp.so(+0x56f0)
[0x20000000008316f0]
[merulo:32531] [ 5]
/home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_init-0x72bf10)
[0x200000000008e0c0]
[merulo:32531] [ 6] ./mpirun(orterun+0x1fffffffff84cc80)
[0x4000000000006c60]
[merulo:32531] [ 7] ./mpirun(main+0x1fffffffff84b880) [0x40000000000045e0]
[merulo:32531] [ 8]
/lib/ia64-linux-gnu/libc.so.6.1(__libc_start_main-0x2fcd50)
[0x20000000004bd2a0]
[merulo:32531] [ 9] ./mpirun(_start+0x1fffffffff84a3c0) [0x40000000000043c0]
[merulo:32531] *** End of error message ***
Segmentation fault

And the backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x2000000000867f40 in orte_plm_rsh_component_query
(module=0x60000fffffffb0d8,
   priority=0x60000fffffffb0d0) at plm_rsh_component.c:205
205            OPAL_OUTPUT_VERBOSE((1, orte_plm_globals.output,
(gdb) bt
#0  0x2000000000867f40 in orte_plm_rsh_component_query
(module=0x60000fffffffb0d8,
   priority=0x60000fffffffb0d0) at plm_rsh_component.c:205
#1  0x20000000001ddea0 in mca_base_select (type_name=0x200000000026e708
"plm", output_id=8,
   components_available=0x20000000002c5f08 <orte_plm_base>,
best_module=0x60000fffffffb0e0,
   best_component=0x60000fffffffb0e8) at mca_base_components_select.c:76
#2  0x20000000001392f0 in orte_plm_base_select () at
base/plm_base_select.c:46
#3  0x20000000008316f0 in rte_init () at ess_hnp_module.c:169
#4  0x200000000008e0c0 in orte_init (pargc=0x60000fffffffb360,
pargv=0x60000fffffffb368, flags=4)
   at runtime/orte_init.c:127
#5  0x4000000000006c60 in orterun (argc=16, argv=0x60000fffffffb618) at
orterun.c:693
#6  0x40000000000045e0 in main (argc=16, argv=0x60000fffffffb618) at
main.c:13


Sylvestre

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel