Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segfault issue - possible bug in openmpi
From: Doug Reeder (dlr_at_[hidden])
Date: 2008-10-04 21:28:04


Shafagh,

I missed the dependence on the number of processors. Apparently there
is some thread support.

Doug
On Oct 4, 2008, at 5:29 PM, Shafagh Jafer wrote:

> Doug Reeder,
> Daniel is saying that the problem only occurs in openmpi when
> running more than 16 processes. So could that still be cause
> becasue openmpi does not support threads??!!
>
> --- On Fri, 10/3/08, Doug Reeder <dlr_at_[hidden]> wrote:
> From: Doug Reeder <dlr_at_[hidden]>
> Subject: Re: [OMPI users] segfault issue - possible bug in openmpi
> To: "Open MPI Users" <users_at_[hidden]>
> Date: Friday, October 3, 2008, 2:40 PM
>
> Daniel,
>
> Are you using threads. I don't think the opempi-1.2.x work with
> threads.
>
> Doug Reeder
> On Oct 3, 2008, at 2:30 PM, Daniel Hansen wrote:
>
>> Oh, by the way, here is the segfault:
>>
>> [m4b-1-8:11481] *** Process received signal ***
>> [m4b-1-8:11481] Signal: Segmentation fault (11)
>> [m4b-1-8:11481] Signal code: Address not mapped (1)
>> [m4b-1-8:11481] Failing at address: 0x2b91c69eed
>> [m4b-1-8:11483] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70]
>> [m4b-1-8:11483] [ 1] /fslhome/dhansen7/openmpi/lib/libmpi.so.0
>> [0x2aaaaabea7c0]
>> [m4b-1-8:11483] [ 2] /fslhome/dhansen7/openmpi/lib/libmpi.so.0
>> [0x2aaaaabea675]
>> [m4b-1-8:11483] [ 3] /fslhome/dhansen7/openmpi/lib/libmpi.so.0
>> (mca_pml_ob1_send+0x2da) [0x2aaaaabeaf55]
>> [m4b-1-8:11483] [ 4] /fslhome/dhansen7/openmpi/lib/libmpi.so.0
>> (MPI_Send+0x28e) [0x2aaaaab52c5a]
>> [m4b-1-8:11483] [ 5] /fslhome/dhansen7/compute/for_DanielHansen/
>> replica_mpi_marylou2/Openmpi_md_twham(twham_init+0x708) [0x42a8a8]
>> [m4b-1-8:11483] [ 6] /fslhome/dhansen7/compute/for_DanielHansen/
>> replica_mpi_marylou2/Openmpi_md_twham(repexch+0x73c) [0x425d5c]
>> [m4b-1-8:11483] [ 7] /fslhome/dhansen7/compute/for_DanielHansen/
>> replica_mpi_marylou2/Openmpi_md_twham(main+0x855) [0x4133a5]
>> [m4b-1-8:11483] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x33e841d8a4]
>> [m4b-1-8:11483] [ 9] /fslhome/dhansen7/compute/for_DanielHansen/
>> replica_mpi_marylou2/Openmpi_md_twham [0x4040b9]
>> [m4b-1-8:11483] *** End of error message ***
>>
>>
>>
>> On Fri, Oct 3, 2008 at 3:20 PM, Daniel Hansen <dhansen_at_[hidden]>
>> wrote:
>> I have been testing some code against openmpi lately that always
>> causes it to crash during certain mpi function calls. The code
>> does not seem to be the problem, as it runs just fine against
>> mpich. I have tested it against openmpi 1.2.5, 1.2.6, and 1.2.7
>> and they all exhibit the same problem. Also, the problem only
>> occurs in openmpi when running more than 16 processes. I have
>> posted this stack trace to the list before, but I am submitting it
>> now as a potential bug report. I need some help debugging it and
>> finding out exactly what is going on in openmpi when the segfault
>> occurs. Are there any suggestions on how best to do this? Is
>> there an easy way to attach gdb to one of the processes or
>> something?? I have already compiled openmpi with debugging,
>> memory profiling, etc. How can I best take advantage of these
>> features?
>>
>> Thanks,
>> Daniel Hansen
>> Systems Administrator
>> BYU Fulton Supercomputing Lab
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users