Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Strange segfault in openmpi
From: Robert Kubrick (robertkubrick_at_[hidden])
Date: 2008-09-19 14:51:32


The line

Signal code: Address not mapped (1)

indicates that there is probably a mismatch between the runtime
library and the linked version. Make sure that you link the program
and run it using the same installation base. Are the libraries in /
usr/mpi/fsl_openmpi_gcc_1.2.6 the same you used at link time?

On Sep 19, 2008, at 2:42 PM, Daniel Hansen wrote:

> I work for a supercomputing organization and we just installed the
> latest version of rocks/centos on our cluster. We compiled openmpi
> from source to customize it for our purposes. Since switching we
> have receive messages from users about errors, segfaults, etc. that
> we didn't see before. Here is one such segfault message that I
> don't have enough knowledge to figure out or even have a clue about
> what is going on. Here it is:
>
> [m4b-1-8:11830] *** Process received signal ***
> [m4b-1-8:11830] Signal: Segmentation fault (11)
> [m4b-1-8:11830] Signal code: Address not mapped (1)
> [m4b-1-8:11830] Failing at address: 0x2abcdff475b0
> [m4b-1-8:11830] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70]
> [m4b-1-8:11830] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_btl_sm_send+0xf1) [0x2aaaaab541d1]
> [m4b-1-8:11830] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e]
> [m4b-1-8:11830] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_isend+0x217) [0x2aaaaab9be37]
> [m4b-1-8:11830] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa]
> [m4b-1-8:11830] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f]
> [m4b-1-8:11830] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (PMPI_Barrier+0x6f) [0x2aaaaab1eadf]
> [m4b-1-8:11830] [ 7] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179]
> [m4b-1-8:11830] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x33e841d8a4]
> [m4b-1-8:11830] [ 9] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug [0x404109]
> [m4b-1-8:11830] *** End of error message ***
> [m4b-1-9:11772] *** Process received signal ***
> [m4b-1-9:11772] Signal: Segmentation fault (11)
> [m4b-1-9:11772] Signal code: Address not mapped (1)
> [m4b-1-9:11772] Failing at address: 0x2abcdff475b0
> [m4b-1-9:11772] [ 0] /lib64/libpthread.so.0 [0x302380de70]
> [m4b-1-9:11772] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_btl_sm_send+0xf1) [0x2aaaaab541d1]
> [m4b-1-9:11772] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e]
> [m4b-1-9:11772] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_isend+0x217) [0x2aaaaab9be37]
> [m4b-1-9:11772] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa]
> [m4b-1-9:11772] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f]
> [m4b-1-9:11772] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (PMPI_Barrier+0x6f) [0x2aaaaab1eadf]
> [m4b-1-9:11772] [ 7] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179]
> [m4b-1-9:11772] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x302301d8a4]
> [m4b-1-9:11772] [ 9] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug [0x404109]
> [m4b-1-9:11772] *** End of error message ***
> [m4b-1-7i][0,1,7][btl_tcp_endpoint.c:
> 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
> errno=111
> [m4b-1-7i][0,1,8][btl_tcp_endpoint.c:
> 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
> errno=111
> [m4b-1-7i][0,1,9][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed with errno=104
> [m4b-1-7i][0,1,9][btl_tcp_endpoint.c:
> 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
> errno=111
> [m4b-1-9:11773] *** Process received signal ***
> [m4b-1-9:11773] Signal: Segmentation fault (11)
> [m4b-1-9:11773] Signal code: Address not mapped (1)
> [m4b-1-9:11773] Failing at address: 0x2abcdff475b0
> [m4b-1-9:11773] [ 0] /lib64/libpthread.so.0 [0x302380de70]
> [m4b-1-9:11773] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_btl_sm_send+0xf1) [0x2aaaaab541d1]
> [m4b-1-9:11773] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e]
> [m4b-1-9:11773] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (mca_pml_ob1_isend+0x217) [0x2aaaaab9be37]
> [m4b-1-9:11773] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa]
> [m4b-1-9:11773] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f]
> [m4b-1-9:11773] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
> (PMPI_Barrier+0x6f) [0x2aaaaab1eadf]
> [m4b-1-9:11773] [ 7] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179]
> [m4b-1-9:11773] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x302301d8a4]
> [m4b-1-9:11773] [ 9] /fslhome/wshuai/compute/for_Shuai2/
> mpi_md_bgo_twham_12sept08_debug [0x404109]
> [m4b-1-9:11773] *** End of error message ***
> orterun noticed that job rank 0 with PID 12338 on node m4b-1-10i
> exited on signal 15 (Terminated).
>
> Can someone give me some clues as to what is going wrong here or
> possibly point me in the right direction? Is there something I or
> the user can do to get more informative error messages? The user
> mentioned that this particular program ran fine before we upgraded
> to the current openmpi version, and that he can't find any bugs in
> his code.
>
> Thanks for your help,
>
> Daniel Hansen
> Systems Administrator
> BYU Fulton Supercomputing Lab
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users