Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segmentation fault with openmpi-1.6.2
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2012-09-10 12:12:38


On 09/10/12 11:37, Ralph Castain wrote:
> On Sep 10, 2012, at 8:12 AM, Aleksey Senin<alekseys_at_[hidden]> wrote:
>
>> On 10/09/2012 15:41, Siegmar Gross wrote:
>>> Hi,
>>>
>>> I have built openmpi-1.6.2rc1 and get the following error.
>>>
>>> tyr small_prog 123 mpicc -showme
>>> cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt
>>> -L/usr/local/openmpi-1.6.2_32_cc/lib -lmpi -lm -lkstat -llgrp
>>> -lsocket -lnsl -lrt -lm
>>> tyr small_prog 124 mpiexec -np 2 -host tyr init_finalize
>>>
>>> Hello!
>>> Hello!
>>>
>>> tyr small_prog 125 mpiexec -np 2 -host sunpc4 init_finalize
>>> key_from_blob: remaining bytes in key blob 81
>>>
>>> Hello!
>>> Hello!
>>>
>>> tyr small_prog 126 mpiexec -np 2 -host tyr,sunpc4 init_finalize
>>> [tyr:23956] *** Process received signal ***
>>> [tyr:23956] Signal: Segmentation Fault (11)
>>> [tyr:23956] Signal code: Address not mapped (1)
>>> [tyr:23956] Failing at address: 18
>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:0x15434c
>>> /lib/libc.so.1:0xcad04
>>> /lib/libc.so.1:0xbf3b4
>>> /lib/libc.so.1:0xbf59c
>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_get_target_nodes+0x1cc [ Signal 11 (SEGV)]
>>> /.../openmpi-1.6.2_32_cc/lib/openmpi/mca_rmaps_round_robin.so:0x1ec8
>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_map_job+0xe4
>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_plm_base_setup_job+0xc4
>>> /.../openmpi-1.6.2_32_cc/lib/openmpi/mca_plm_rsh.so:orte_plm_rsh_launch+0x1b0
>>> /.../openmpi-1.6.2_32_cc/bin/orterun:orterun+0x16a8
>>> /.../openmpi-1.6.2_32_cc/bin/orterun:main+0x24
>>> /.../openmpi-1.6.2_32_cc/bin/orterun:_start+0xd8
>>> [tyr:23956] *** End of error message ***
>>> Segmentation fault
>>>
>>> Do you have any ideas or suggestions? As I wrote in my email from
>>> yesterday, I had to add "#include<math.h>" into file
>>> openmpi-1.6.2rc1/ompi/contrib/vt/vt/extlib/otf/tools/otfaux/otfaux.cpp
>>> to have a prototype for function "rint" in line 834. Thank you very
>>> much for any help in advance.
> Really? That shouldn't happen - I'll take a look at that one.
Yes, Oracle MTT testing shows 1.6.2rc2r27272 DOA:

% mpirun --host burl-ct-x2200-2 -np 2 hostname
burl-ct-x2200-2
burl-ct-x2200-2
% mpirun --host burl-ct-x2200-3 -np 2 hostname
burl-ct-x2200-3
burl-ct-x2200-3
% mpirun --host burl-ct-x2200-2,burl-ct-x2200-3 -np 2 hostname
[burl-ct-x2200-2:26019] *** Process received signal ***
[burl-ct-x2200-2:26019] Signal: Segmentation fault (11)
[burl-ct-x2200-2:26019] Signal code: Address not mapped (1)
[burl-ct-x2200-2:26019] Failing at address: 0x18
[burl-ct-x2200-2:26019] [ 0] [0xffffe600]
[burl-ct-x2200-2:26019] [ 1]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_rmaps_base_get_target_nodes+0x432)
[0xf7e6d482]
[burl-ct-x2200-2:26019] [ 2]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/openmpi/mca_rmaps_round_robin.so
[0xf7dcd8e5]
[burl-ct-x2200-2:26019] [ 3]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_rmaps_base_map_job+0x46)
[0xf7e6c4d6]
[burl-ct-x2200-2:26019] [ 4]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_plm_base_setup_job+0x9c)
[0xf7e647ec]
[burl-ct-x2200-2:26019] [ 5]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_launch+0x244)
[0xf7dfb634]
[burl-ct-x2200-2:26019] [ 6] mpirun(orterun+0xf5e) [0x804b868]
[burl-ct-x2200-2:26019] [ 7] mpirun(main+0x22) [0x804a8f6]
[burl-ct-x2200-2:26019] [ 8] /lib/libc.so.6(__libc_start_main+0xdc)
[0xb10dec]
[burl-ct-x2200-2:26019] [ 9] mpirun [0x804a851]
[burl-ct-x2200-2:26019] *** End of error message ***
Segmentation fault