Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Openmpi 1.6.5 is freezing under GNU/Linux ia64
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-11-15 11:37:40


Trying to replicate this, but I can't. I'm using the latest 1.6 tarball, not 1.6.5, so it is possible something was fixed - though I believe we have committed very few changes as that series is about to drop to "deprecated".

First thing I encountered:

configure: WARNING: unrecognized options: --disable-maintainer-mode, --enable-ltdl-convenience

So I removed those - no idea what they even do - but retained the rest of your configure options. I then used your cmd line, substituting "hostname" for "foo", and everything ran just fine on an ssh-based system. Here's my system info:

Linux bend001 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)

On Nov 15, 2013, at 7:24 AM, Sylvestre Ledru <sylvestre_at_[hidden]> wrote:

> Hello,
>
> On 02/10/2013 19:34, Jeff Squyres (jsquyres) wrote:
>> On Sep 30, 2013, at 11:05 AM, Sylvestre Ledru <sylvestre_at_[hidden]> wrote:
>>
>>> Here are the options list:
>>> configure: running /bin/bash './configure' CFLAGS="-DNDEBUG -g -O2
>>> -Wformat -Werror=format-security -finline-functions -fno-strict-aliasing
>>> -pthread" CPPFLAGS=" -I/usr//include -I/usr/include/infiniband
>>> -I/usr/include/infiniband" FFLAGS="-g -O2" LDFLAGS=" -L/usr//lib"
>>> --enable-shared --disable-static --prefix=/usr --with-mpi=open_mpi
>>> --disable-aio --cache-file=/dev/null --srcdir=. --disable-option-checking
>> Hmm -- I'm confused here; it's not possible that you're getting an assertion failure with this configure line, for two reasons:
>>
>> 1. The assert() in question will only be compiled in if you --enable-debug on the configure command line.
>> 2. You supplied -DNDEBUG in CFLAGS, which means you've disabled all assert()s
>>
>> Can you verify that this is the correct configure line that you used to generate that error? Or is something else going on?
>>
> So, I tried with the arguments you sent me in private.
> $ ./configure --prefix=/home/sylvestre/bogus2 --disable-maintainer-mode
> --disable-dependency-tracking --with-threads=posix
> --enable-opal-multi-threads --disable-silent-rules --enable-debug
> --with-devel-headers --with-slurm --with-sge --enable-heterogeneous
> --disable-vt --enable-mpirun-prefix-by-default --enable-mpi-f77
> --enable-mpi-f90 --enable-ltdl-convenience
>
> I am getting something more interesting than a freeze (even if it does
> not mean much to me):
> ./mpirun -mca plm_base_verbose 5 -mca ras_base_verbose 5 -mca
> rmaps_base_verbose 5 -mca ess_base_verbose 5 -c 4 foo
> [merulo:32531] mca:base:select:( ess) Querying component [env]
> [merulo:32531] mca:base:select:( ess) Skipping component [env]. Query
> failed to return a module
> [merulo:32531] mca:base:select:( ess) Querying component [hnp]
> [merulo:32531] mca:base:select:( ess) Query of component [hnp] set
> priority to 100
> [merulo:32531] mca:base:select:( ess) Querying component [singleton]
> [merulo:32531] mca:base:select:( ess) Skipping component [singleton].
> Query failed to return a module
> [merulo:32531] mca:base:select:( ess) Querying component [slave]
> [merulo:32531] mca:base:select:( ess) Query of component [slave] set
> priority to 0
> [merulo:32531] mca:base:select:( ess) Querying component [slurm]
> [merulo:32531] mca:base:select:( ess) Skipping component [slurm]. Query
> failed to return a module
> [merulo:32531] mca:base:select:( ess) Querying component [slurmd]
> [merulo:32531] mca:base:select:( ess) Skipping component [slurmd].
> Query failed to return a module
> [merulo:32531] mca:base:select:( ess) Querying component [tm]
> [merulo:32531] mca:base:select:( ess) Skipping component [tm]. Query
> failed to return a module
> [merulo:32531] mca:base:select:( ess) Querying component [tool]
> [merulo:32531] mca:base:select:( ess) Skipping component [tool]. Query
> failed to return a module
> [merulo:32531] mca:base:select:( ess) Selected component [hnp]
> [merulo:32531] mca:base:select:( plm) Querying component [rsh]
> [merulo:32531] [[INVALID],INVALID] plm:base:rsh_lookup on agent ssh :
> rsh path NULL
> [merulo:32531] *** Process received signal ***
> [merulo:32531] Signal: Segmentation fault (11)
> [merulo:32531] Signal code: Invalid permissions (2)
> [merulo:32531] Failing at address: (nil)
> [merulo:32531] [ 0]
> linux-gate.so.1(__kernel_sigtramp+0x7fffffffff886860) [0xa000000000040800]
> [merulo:32531] [ 1]
> /home/sylvestre/bogus2/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_component_query+0xae3c0)
> [0x2000000000867f40]
> [merulo:32531] [ 2]
> /home/sylvestre/bogus2/lib/libopen-rte.so.4(mca_base_select-0x5dc110)
> [0x20000000001ddea0]
> [merulo:32531] [ 3]
> /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_plm_base_select-0x680cd0)
> [0x20000000001392f0]
> [merulo:32531] [ 4]
> /home/sylvestre/bogus2/lib/openmpi/mca_ess_hnp.so(+0x56f0)
> [0x20000000008316f0]
> [merulo:32531] [ 5]
> /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_init-0x72bf10)
> [0x200000000008e0c0]
> [merulo:32531] [ 6] ./mpirun(orterun+0x1fffffffff84cc80)
> [0x4000000000006c60]
> [merulo:32531] [ 7] ./mpirun(main+0x1fffffffff84b880) [0x40000000000045e0]
> [merulo:32531] [ 8]
> /lib/ia64-linux-gnu/libc.so.6.1(__libc_start_main-0x2fcd50)
> [0x20000000004bd2a0]
> [merulo:32531] [ 9] ./mpirun(_start+0x1fffffffff84a3c0) [0x40000000000043c0]
> [merulo:32531] *** End of error message ***
> Segmentation fault
>
> And the backtrace:
> Program received signal SIGSEGV, Segmentation fault.
> 0x2000000000867f40 in orte_plm_rsh_component_query
> (module=0x60000fffffffb0d8,
> priority=0x60000fffffffb0d0) at plm_rsh_component.c:205
> 205 OPAL_OUTPUT_VERBOSE((1, orte_plm_globals.output,
> (gdb) bt
> #0 0x2000000000867f40 in orte_plm_rsh_component_query
> (module=0x60000fffffffb0d8,
> priority=0x60000fffffffb0d0) at plm_rsh_component.c:205
> #1 0x20000000001ddea0 in mca_base_select (type_name=0x200000000026e708
> "plm", output_id=8,
> components_available=0x20000000002c5f08 <orte_plm_base>,
> best_module=0x60000fffffffb0e0,
> best_component=0x60000fffffffb0e8) at mca_base_components_select.c:76
> #2 0x20000000001392f0 in orte_plm_base_select () at
> base/plm_base_select.c:46
> #3 0x20000000008316f0 in rte_init () at ess_hnp_module.c:169
> #4 0x200000000008e0c0 in orte_init (pargc=0x60000fffffffb360,
> pargv=0x60000fffffffb368, flags=4)
> at runtime/orte_init.c:127
> #5 0x4000000000006c60 in orterun (argc=16, argv=0x60000fffffffb618) at
> orterun.c:693
> #6 0x40000000000045e0 in main (argc=16, argv=0x60000fffffffb618) at
> main.c:13
>
>
> Sylvestre
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel