Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] ***SPAM*** Re: [ewg] Seg fault running OpenMPI-1.3.1rc4
From: Pavel Shamis (Pasha) (pashash_at_[hidden])
Date: 2009-03-30 11:45:27


Steve,
If you will compile OMPI code with CFLAGS="-g" ,generate segfault
core_file and send the core + IMB-MPI1 to me I will be able to
understand the problem better.

Regards,
Pasha

Steve Wise wrote:
>
> Hey Pasha,
>
>
> I just applied r20872 and retested, and I still hit this seg fault.
> So I think this is a new bug.
>
> Lemme pull the trunk and try that.
>
>
>
> Pavel Shamis (Pasha) wrote:
>> I think you problem is related to this bug:
>> https://svn.open-mpi.org/trac/ompi/ticket/1823
>>
>> And it is resolved on the ompi-trunk.
>>
>> Pasha.
>>
>> Steve Wise wrote:
>>> When this happens, that node logs this type of message also in
>>> /var/log/messages:
>>>
>>> IMB-MPI1[8859]: segfault at 0000000000000018 rip 00002b7bfc880800
>>> rsp 00007fffb1021330 error 4
>>>
>>> Steve Wise wrote:
>>>> Hey Jeff,
>>>>
>>>> Have you seen this? I'm hitting this regularly running on
>>>> ofed-1.4.1-rc2.
>>>>
>>>> Test:
>>>> [ompi_at_vic12 ~]$ cat doit-ompi
>>>> #!/bin/sh
>>>> while : ; do
>>>> mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g
>>>> --mca btl openib,self,sm --mca btl_openib_max_btls 1
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16
>>>> bcast scatter sendrecv exchange </dev/null
>>>> done
>>>>
>>>>
>>>> Seg Fault output:
>>>>
>>>> [vic21:04047] *** Process received signal ***
>>>> [vic21:04047] Signal: Segmentation fault (11)
>>>> [vic21:04047] Signal code: Address not mapped (1)
>>>> [vic21:04047] Failing at address: 0x18
>>>> [vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
>>>> [vic21:04047] [ 1]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
>>>> [0x2b911bc33800]
>>>> [vic21:04047] [ 2]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
>>>> [0x2b911bc38c2d]
>>>> [vic21:04047] [ 3]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
>>>> [0x2b911bc33fcb]
>>>> [vic21:04047] [ 4]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so
>>>> [0x2b911bc22af8]
>>>> [vic21:04047] [ 5]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83)
>>>> [0x2b911933da33]
>>>> [vic21:04047] [ 6]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0)
>>>> [0x2b9118ea3fb0]
>>>> [vic21:04047] [ 7]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so
>>>> [0x2b911ba1938f]
>>>> [vic21:04047] [ 8]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so
>>>> [0x2b911b601cde]
>>>> [vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0
>>>> [0x2b9118e7241b]
>>>> [vic21:04047] [10]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178)
>>>> [0x403498]
>>>> [vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4)
>>>> [0x3ddd61d974]
>>>> [vic21:04047] [12]
>>>> /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]
>>>> [vic21:04047] *** End of error message ***
>>>>
>>>> _______________________________________________
>>>> ewg mailing list
>>>> ewg_at_[hidden]
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> ewg mailing list
>> ewg_at_[hidden]
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
> _______________________________________________
> ewg mailing list
> ewg_at_[hidden]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>