Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Openmpi 1.6.5 is freezing under GNU/Linux ia64
From: Sylvestre Ledru (sylvestre_at_[hidden])
Date: 2013-11-15 10:24:29


Hello,

On 02/10/2013 19:34, Jeff Squyres (jsquyres) wrote:
> On Sep 30, 2013, at 11:05 AM, Sylvestre Ledru <sylvestre_at_[hidden]> wrote:
>
>> Here are the options list:
>> configure: running /bin/bash './configure' CFLAGS="-DNDEBUG -g -O2
>> -Wformat -Werror=format-security -finline-functions -fno-strict-aliasing
>> -pthread" CPPFLAGS=" -I/usr//include -I/usr/include/infiniband
>> -I/usr/include/infiniband" FFLAGS="-g -O2" LDFLAGS=" -L/usr//lib"
>> --enable-shared --disable-static --prefix=/usr --with-mpi=open_mpi
>> --disable-aio --cache-file=/dev/null --srcdir=. --disable-option-checking
> Hmm -- I'm confused here; it's not possible that you're getting an assertion failure with this configure line, for two reasons:
>
> 1. The assert() in question will only be compiled in if you --enable-debug on the configure command line.
> 2. You supplied -DNDEBUG in CFLAGS, which means you've disabled all assert()s
>
> Can you verify that this is the correct configure line that you used to generate that error? Or is something else going on?
>
So, I tried with the arguments you sent me in private.
 $ ./configure --prefix=/home/sylvestre/bogus2 --disable-maintainer-mode
--disable-dependency-tracking --with-threads=posix
--enable-opal-multi-threads --disable-silent-rules --enable-debug
--with-devel-headers --with-slurm --with-sge --enable-heterogeneous
--disable-vt --enable-mpirun-prefix-by-default --enable-mpi-f77
--enable-mpi-f90 --enable-ltdl-convenience

I am getting something more interesting than a freeze (even if it does
not mean much to me):
./mpirun -mca plm_base_verbose 5 -mca ras_base_verbose 5 -mca
rmaps_base_verbose 5 -mca ess_base_verbose 5 -c 4 foo
[merulo:32531] mca:base:select:( ess) Querying component [env]
[merulo:32531] mca:base:select:( ess) Skipping component [env]. Query
failed to return a module
[merulo:32531] mca:base:select:( ess) Querying component [hnp]
[merulo:32531] mca:base:select:( ess) Query of component [hnp] set
priority to 100
[merulo:32531] mca:base:select:( ess) Querying component [singleton]
[merulo:32531] mca:base:select:( ess) Skipping component [singleton].
Query failed to return a module
[merulo:32531] mca:base:select:( ess) Querying component [slave]
[merulo:32531] mca:base:select:( ess) Query of component [slave] set
priority to 0
[merulo:32531] mca:base:select:( ess) Querying component [slurm]
[merulo:32531] mca:base:select:( ess) Skipping component [slurm]. Query
failed to return a module
[merulo:32531] mca:base:select:( ess) Querying component [slurmd]
[merulo:32531] mca:base:select:( ess) Skipping component [slurmd].
Query failed to return a module
[merulo:32531] mca:base:select:( ess) Querying component [tm]
[merulo:32531] mca:base:select:( ess) Skipping component [tm]. Query
failed to return a module
[merulo:32531] mca:base:select:( ess) Querying component [tool]
[merulo:32531] mca:base:select:( ess) Skipping component [tool]. Query
failed to return a module
[merulo:32531] mca:base:select:( ess) Selected component [hnp]
[merulo:32531] mca:base:select:( plm) Querying component [rsh]
[merulo:32531] [[INVALID],INVALID] plm:base:rsh_lookup on agent ssh :
rsh path NULL
[merulo:32531] *** Process received signal ***
[merulo:32531] Signal: Segmentation fault (11)
[merulo:32531] Signal code: Invalid permissions (2)
[merulo:32531] Failing at address: (nil)
[merulo:32531] [ 0]
linux-gate.so.1(__kernel_sigtramp+0x7fffffffff886860) [0xa000000000040800]
[merulo:32531] [ 1]
/home/sylvestre/bogus2/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_component_query+0xae3c0)
[0x2000000000867f40]
[merulo:32531] [ 2]
/home/sylvestre/bogus2/lib/libopen-rte.so.4(mca_base_select-0x5dc110)
[0x20000000001ddea0]
[merulo:32531] [ 3]
/home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_plm_base_select-0x680cd0)
[0x20000000001392f0]
[merulo:32531] [ 4]
/home/sylvestre/bogus2/lib/openmpi/mca_ess_hnp.so(+0x56f0)
[0x20000000008316f0]
[merulo:32531] [ 5]
/home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_init-0x72bf10)
[0x200000000008e0c0]
[merulo:32531] [ 6] ./mpirun(orterun+0x1fffffffff84cc80)
[0x4000000000006c60]
[merulo:32531] [ 7] ./mpirun(main+0x1fffffffff84b880) [0x40000000000045e0]
[merulo:32531] [ 8]
/lib/ia64-linux-gnu/libc.so.6.1(__libc_start_main-0x2fcd50)
[0x20000000004bd2a0]
[merulo:32531] [ 9] ./mpirun(_start+0x1fffffffff84a3c0) [0x40000000000043c0]
[merulo:32531] *** End of error message ***
Segmentation fault

And the backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x2000000000867f40 in orte_plm_rsh_component_query
(module=0x60000fffffffb0d8,
    priority=0x60000fffffffb0d0) at plm_rsh_component.c:205
205 OPAL_OUTPUT_VERBOSE((1, orte_plm_globals.output,
(gdb) bt
#0 0x2000000000867f40 in orte_plm_rsh_component_query
(module=0x60000fffffffb0d8,
    priority=0x60000fffffffb0d0) at plm_rsh_component.c:205
#1 0x20000000001ddea0 in mca_base_select (type_name=0x200000000026e708
"plm", output_id=8,
    components_available=0x20000000002c5f08 <orte_plm_base>,
best_module=0x60000fffffffb0e0,
    best_component=0x60000fffffffb0e8) at mca_base_components_select.c:76
#2 0x20000000001392f0 in orte_plm_base_select () at
base/plm_base_select.c:46
#3 0x20000000008316f0 in rte_init () at ess_hnp_module.c:169
#4 0x200000000008e0c0 in orte_init (pargc=0x60000fffffffb360,
pargv=0x60000fffffffb368, flags=4)
    at runtime/orte_init.c:127
#5 0x4000000000006c60 in orterun (argc=16, argv=0x60000fffffffb618) at
orterun.c:693
#6 0x40000000000045e0 in main (argc=16, argv=0x60000fffffffb618) at
main.c:13

Sylvestre