Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segfault issue - possible bug in openmpi
From: Doug Reeder (dlr_at_[hidden])
Date: 2008-10-03 17:40:40


Daniel,

Are you using threads. I don't think the opempi-1.2.x work with threads.

Doug Reeder
On Oct 3, 2008, at 2:30 PM, Daniel Hansen wrote:

> Oh, by the way, here is the segfault:
>
> [m4b-1-8:11481] *** Process received signal ***
> [m4b-1-8:11481] Signal: Segmentation fault (11)
> [m4b-1-8:11481] Signal code: Address not mapped (1)
> [m4b-1-8:11481] Failing at address: 0x2b91c69eed
> [m4b-1-8:11483] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70]
> [m4b-1-8:11483] [ 1] /fslhome/dhansen7/openmpi/lib/libmpi.so.0
> [0x2aaaaabea7c0]
> [m4b-1-8:11483] [ 2] /fslhome/dhansen7/openmpi/lib/libmpi.so.0
> [0x2aaaaabea675]
> [m4b-1-8:11483] [ 3] /fslhome/dhansen7/openmpi/lib/libmpi.so.0
> (mca_pml_ob1_send+0x2da) [0x2aaaaabeaf55]
> [m4b-1-8:11483] [ 4] /fslhome/dhansen7/openmpi/lib/libmpi.so.0
> (MPI_Send+0x28e) [0x2aaaaab52c5a]
> [m4b-1-8:11483] [ 5] /fslhome/dhansen7/compute/for_DanielHansen/
> replica_mpi_marylou2/Openmpi_md_twham(twham_init+0x708) [0x42a8a8]
> [m4b-1-8:11483] [ 6] /fslhome/dhansen7/compute/for_DanielHansen/
> replica_mpi_marylou2/Openmpi_md_twham(repexch+0x73c) [0x425d5c]
> [m4b-1-8:11483] [ 7] /fslhome/dhansen7/compute/for_DanielHansen/
> replica_mpi_marylou2/Openmpi_md_twham(main+0x855) [0x4133a5]
> [m4b-1-8:11483] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x33e841d8a4]
> [m4b-1-8:11483] [ 9] /fslhome/dhansen7/compute/for_DanielHansen/
> replica_mpi_marylou2/Openmpi_md_twham [0x4040b9]
> [m4b-1-8:11483] *** End of error message ***
>
>
>
> On Fri, Oct 3, 2008 at 3:20 PM, Daniel Hansen <dhansen_at_[hidden]> wrote:
> I have been testing some code against openmpi lately that always
> causes it to crash during certain mpi function calls. The code
> does not seem to be the problem, as it runs just fine against
> mpich. I have tested it against openmpi 1.2.5, 1.2.6, and 1.2.7
> and they all exhibit the same problem. Also, the problem only
> occurs in openmpi when running more than 16 processes. I have
> posted this stack trace to the list before, but I am submitting it
> now as a potential bug report. I need some help debugging it and
> finding out exactly what is going on in openmpi when the segfault
> occurs. Are there any suggestions on how best to do this? Is
> there an easy way to attach gdb to one of the processes or
> something?? I have already compiled openmpi with debugging, memory
> profiling, etc. How can I best take advantage of these features?
>
> Thanks,
> Daniel Hansen
> Systems Administrator
> BYU Fulton Supercomputing Lab
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users