Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segmentation fault with openmpi-1.6.2
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-09-10 12:20:43


Wow - okay, I'll have to investigate. Be aware, though, that you just described a completely different failure. Oracle isn't using slurm, last I heard - you were using rsh/qrsh. And you aren't running from a backend node, but from the same frontend - just have two hosts listed in your -host entry.

I'll look at both issues. Thx!

On Sep 10, 2012, at 9:12 AM, Eugene Loh <eugene.loh_at_[hidden]> wrote:

> On 09/10/12 11:37, Ralph Castain wrote:
>> On Sep 10, 2012, at 8:12 AM, Aleksey Senin<alekseys_at_[hidden]> wrote:
>>
>>> On 10/09/2012 15:41, Siegmar Gross wrote:
>>>> Hi,
>>>>
>>>> I have built openmpi-1.6.2rc1 and get the following error.
>>>>
>>>> tyr small_prog 123 mpicc -showme
>>>> cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt
>>>> -L/usr/local/openmpi-1.6.2_32_cc/lib -lmpi -lm -lkstat -llgrp
>>>> -lsocket -lnsl -lrt -lm
>>>> tyr small_prog 124 mpiexec -np 2 -host tyr init_finalize
>>>>
>>>> Hello!
>>>> Hello!
>>>>
>>>> tyr small_prog 125 mpiexec -np 2 -host sunpc4 init_finalize
>>>> key_from_blob: remaining bytes in key blob 81
>>>>
>>>> Hello!
>>>> Hello!
>>>>
>>>> tyr small_prog 126 mpiexec -np 2 -host tyr,sunpc4 init_finalize
>>>> [tyr:23956] *** Process received signal ***
>>>> [tyr:23956] Signal: Segmentation Fault (11)
>>>> [tyr:23956] Signal code: Address not mapped (1)
>>>> [tyr:23956] Failing at address: 18
>>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:0x15434c
>>>> /lib/libc.so.1:0xcad04
>>>> /lib/libc.so.1:0xbf3b4
>>>> /lib/libc.so.1:0xbf59c
>>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_get_target_nodes+0x1cc [ Signal 11 (SEGV)]
>>>> /.../openmpi-1.6.2_32_cc/lib/openmpi/mca_rmaps_round_robin.so:0x1ec8
>>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_map_job+0xe4
>>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_plm_base_setup_job+0xc4
>>>> /.../openmpi-1.6.2_32_cc/lib/openmpi/mca_plm_rsh.so:orte_plm_rsh_launch+0x1b0
>>>> /.../openmpi-1.6.2_32_cc/bin/orterun:orterun+0x16a8
>>>> /.../openmpi-1.6.2_32_cc/bin/orterun:main+0x24
>>>> /.../openmpi-1.6.2_32_cc/bin/orterun:_start+0xd8
>>>> [tyr:23956] *** End of error message ***
>>>> Segmentation fault
>>>>
>>>> Do you have any ideas or suggestions? As I wrote in my email from
>>>> yesterday, I had to add "#include<math.h>" into file
>>>> openmpi-1.6.2rc1/ompi/contrib/vt/vt/extlib/otf/tools/otfaux/otfaux.cpp
>>>> to have a prototype for function "rint" in line 834. Thank you very
>>>> much for any help in advance.
>> Really? That shouldn't happen - I'll take a look at that one.
> Yes, Oracle MTT testing shows 1.6.2rc2r27272 DOA:
>
> % mpirun --host burl-ct-x2200-2 -np 2 hostname
> burl-ct-x2200-2
> burl-ct-x2200-2
> % mpirun --host burl-ct-x2200-3 -np 2 hostname
> burl-ct-x2200-3
> burl-ct-x2200-3
> % mpirun --host burl-ct-x2200-2,burl-ct-x2200-3 -np 2 hostname
> [burl-ct-x2200-2:26019] *** Process received signal ***
> [burl-ct-x2200-2:26019] Signal: Segmentation fault (11)
> [burl-ct-x2200-2:26019] Signal code: Address not mapped (1)
> [burl-ct-x2200-2:26019] Failing at address: 0x18
> [burl-ct-x2200-2:26019] [ 0] [0xffffe600]
> [burl-ct-x2200-2:26019] [ 1] /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_rmaps_base_get_target_nodes+0x432) [0xf7e6d482]
> [burl-ct-x2200-2:26019] [ 2] /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/openmpi/mca_rmaps_round_robin.so [0xf7dcd8e5]
> [burl-ct-x2200-2:26019] [ 3] /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_rmaps_base_map_job+0x46) [0xf7e6c4d6]
> [burl-ct-x2200-2:26019] [ 4] /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_plm_base_setup_job+0x9c) [0xf7e647ec]
> [burl-ct-x2200-2:26019] [ 5] /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_launch+0x244) [0xf7dfb634]
> [burl-ct-x2200-2:26019] [ 6] mpirun(orterun+0xf5e) [0x804b868]
> [burl-ct-x2200-2:26019] [ 7] mpirun(main+0x22) [0x804a8f6]
> [burl-ct-x2200-2:26019] [ 8] /lib/libc.so.6(__libc_start_main+0xdc) [0xb10dec]
> [burl-ct-x2200-2:26019] [ 9] mpirun [0x804a851]
> [burl-ct-x2200-2:26019] *** End of error message ***
> Segmentation fault
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users