Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Strange segfault in openmpi
From: Robert Kubrick (robertkubrick_at_[hidden])
Date: 2008-09-19 14:51:32


The line

Signal code: Address not mapped (1)

indicates that there is probably a mismatch between the runtime
library and the linked version. Make sure that you link the program
and run it using the same installation base. Are the libraries in /
usr/mpi/fsl_openmpi_gcc_1.2.6 the same you used at link time?

On Sep 19, 2008, at 2:42 PM, Daniel Hansen wrote:

> I work for a supercomputing organization and we just installed the
> latest version of rocks/centos on our cluster. We compiled openmpi
> from source to customize it for our purposes. Since switching we
> have receive messages from users about errors, segfaults, etc. that
> we didn't see before. Here is one such segfault message that I
> don't have enough knowledge to figure out or even have a clue about
> what is going on. Here it is:
>
> [m4b-1-8:11830] *** Process received signal ***
> [m4b-1-8:11830] Signal: Segmentation fault (11)
> [m4b-1-8:11830] Signal code: Address not mapped (1)
> [m4b-1-8:11830] Failing at address: 0x2abcdff475b0
> [m4b-1-8:11830] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70]
> [m4b-1-8:11830] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_btl_sm_send+0xf1) [0x2aaaaab541d1]
> [m4b-1-8:11830] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e]
> [m4b-1-8:11830] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_isend+0x217) [0x2aaaaab9be37]
> [m4b-1-8:11830] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa]
> [m4b-1-8:11830] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f]
> [m4b-1-8:11830] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (PMPI_Barrier+0x6f) [0x2aaaaab1eadf]
> [m4b-1-8:11830] [ 7] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179]
> [m4b-1-8:11830] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x33e841d8a4]
> [m4b-1-8:11830] [ 9] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug [0x404109]
> [m4b-1-8:11830] *** End of error message ***
> [m4b-1-9:11772] *** Process received signal ***
> [m4b-1-9:11772] Signal: Segmentation fault (11)
> [m4b-1-9:11772] Signal code: Address not mapped (1)
> [m4b-1-9:11772] Failing at address: 0x2abcdff475b0
> [m4b-1-9:11772] [ 0] /lib64/libpthread.so.0 [0x302380de70]
> [m4b-1-9:11772] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_btl_sm_send+0xf1) [0x2aaaaab541d1]
> [m4b-1-9:11772] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e]
> [m4b-1-9:11772] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_isend+0x217) [0x2aaaaab9be37]
> [m4b-1-9:11772] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa]
> [m4b-1-9:11772] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f]
> [m4b-1-9:11772] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (PMPI_Barrier+0x6f) [0x2aaaaab1eadf]
> [m4b-1-9:11772] [ 7] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179]
> [m4b-1-9:11772] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x302301d8a4]
> [m4b-1-9:11772] [ 9] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug [0x404109]
> [m4b-1-9:11772] *** End of error message ***
> [m4b-1-7i][0,1,7][btl_tcp_endpoint.c:
> 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
> errno=111
> [m4b-1-7i][0,1,8][btl_tcp_endpoint.c:
> 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
> errno=111
> [m4b-1-7i][0,1,9][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed with errno=104
> [m4b-1-7i][0,1,9][btl_tcp_endpoint.c:
> 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
> errno=111
> [m4b-1-9:11773] *** Process received signal ***
> [m4b-1-9:11773] Signal: Segmentation fault (11)
> [m4b-1-9:11773] Signal code: Address not mapped (1)
> [m4b-1-9:11773] Failing at address: 0x2abcdff475b0
> [m4b-1-9:11773] [ 0] /lib64/libpthread.so.0 [0x302380de70]
> [m4b-1-9:11773] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_btl_sm_send+0xf1) [0x2aaaaab541d1]
> [m4b-1-9:11773] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e]
> [m4b-1-9:11773] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_isend+0x217) [0x2aaaaab9be37]
> [m4b-1-9:11773] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa]
> [m4b-1-9:11773] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f]
> [m4b-1-9:11773] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (PMPI_Barrier+0x6f) [0x2aaaaab1eadf]
> [m4b-1-9:11773] [ 7] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179]
> [m4b-1-9:11773] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x302301d8a4]
> [m4b-1-9:11773] [ 9] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug [0x404109]
> [m4b-1-9:11773] *** End of error message ***
> orterun noticed that job rank 0 with PID 12338 on node m4b-1-10i
> exited on signal 15 (Terminated).
>
> Can someone give me some clues as to what is going wrong here or
> possibly point me in the right direction? Is there something I or
> the user can do to get more informative error messages? The user
> mentioned that this particular program ran fine before we upgraded
> to the current openmpi version, and that he can't find any bugs in
> his code.
>
> Thanks for your help,
>
> Daniel Hansen
> Systems Administrator
> BYU Fulton Supercomputing Lab
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users