Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Strange Segmentation Fault inside MPI_Init
From: Srikanth Raju (srikiraju_at_[hidden])
Date: 2010-09-11 03:35:40


Hello OMPI Users,
I'm using OpenMPI 1.4.1 with gcc 4.4.3 on my x86_64 linux system running the
latest Ubuntu 10.04 distro. I don't seem to be able to run any OpenMPI
application. I try running the simplest application, which goes like this

#include<mpi.h>
int main(int argc, char * argv[])
{
MPI_Init(NULL, NULL);
MPI_Finalize();
}

Compiling it with "mpicc -g test.c"
Running with "mpirun -n 2 -hostfile hosts a.out"
hosts file contains "localhost slots=2"
On run, I get this

[starbuck:18829] *** Process received signal ***
[starbuck:18830] *** Process received signal ***
[starbuck:18830] Signal: Segmentation fault (11)
[starbuck:18830] Signal code: Address not mapped (1)
[starbuck:18830] Failing at address: 0x3c
[starbuck:18829] Signal: Segmentation fault (11)
[starbuck:18829] Signal code: Address not mapped (1)
[starbuck:18829] Failing at address: 0x3c
[starbuck:18830] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7f3b0aae08f0]
[starbuck:18830] [ 1] /usr/local/lib/libmca_common_sm.so.1(+0x1561)
[0x7f3b082e8561]
[starbuck:18830] [ 2]
/usr/local/lib/libmca_common_sm.so.1(mca_common_sm_mmap_init+0x6c1)
[0x7f3b082e9137]
[starbuck:18830] [ 3] /usr/lib/openmpi/lib/openmpi/mca_mpool_sm.so(+0x137b)
[0x7f3b084ed37b]
[starbuck:18830] [ 4]
/usr/lib/libmpi.so.0(mca_mpool_base_module_create+0x7d) [0x7f3b0bacc38d]
[starbuck:18830] [ 5] /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(+0x2a38)
[0x7f3b06c52a38]
[starbuck:18830] [ 6] /usr/lib/openmpi/lib/openmpi/mca_bml_r2.so(+0x18e7)
[0x7f3b076a48e7]
[starbuck:18830] [ 7] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x258c)
[0x7f3b07aae58c]
[starbuck:18830] [ 8] /usr/lib/libmpi.so.0(+0x392bf) [0x7f3b0ba8b2bf]
[starbuck:18830] [ 9] /usr/lib/libmpi.so.0(MPI_Init+0x170) [0x7f3b0baac330]
[starbuck:18830] [10] a.out(main+0x22) [0x400866]
[starbuck:18830] [11] /lib/libc.so.6(__libc_start_main+0xfd)
[0x7f3b0a76cc4d]
[starbuck:18830] [12] a.out() [0x400789]
[starbuck:18830] *** End of error message ***
[starbuck:18829] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7fb6efefe8f0]
[starbuck:18829] [ 1] /usr/local/lib/libmca_common_sm.so.1(+0x1561)
[0x7fb6ed706561]
[starbuck:18829] [ 2]
/usr/local/lib/libmca_common_sm.so.1(mca_common_sm_mmap_init+0x6c1)
[0x7fb6ed707137]
[starbuck:18829] [ 3] /usr/lib/openmpi/lib/openmpi/mca_mpool_sm.so(+0x137b)
[0x7fb6ed90b37b]
[starbuck:18829] [ 4]
/usr/lib/libmpi.so.0(mca_mpool_base_module_create+0x7d) [0x7fb6f0eea38d]
[starbuck:18829] [ 5] /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(+0x2a38)
[0x7fb6ec070a38]
[starbuck:18829] [ 6] /usr/lib/openmpi/lib/openmpi/mca_bml_r2.so(+0x18e7)
[0x7fb6ecac28e7]
[starbuck:18829] [ 7] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x258c)
[0x7fb6ececc58c]
[starbuck:18829] [ 8] /usr/lib/libmpi.so.0(+0x392bf) [0x7fb6f0ea92bf]
[starbuck:18829] [ 9] /usr/lib/libmpi.so.0(MPI_Init+0x170) [0x7fb6f0eca330]
[starbuck:18829] [10] a.out(main+0x22) [0x400866]
[starbuck:18829] [11] /lib/libc.so.6(__libc_start_main+0xfd)
[0x7fb6efb8ac4d]
[starbuck:18829] [12] a.out() [0x400789]
[starbuck:18829] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 18830 on node starbuck exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

My stack trace from gdb is:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff43c2561 in opal_list_get_first (list=0x7ffff45c5240)
    at ../../../../../opal/class/opal_list.h:201
201 assert(1 == item->opal_list_item_refcount);
(gdb) bt
#0 0x00007ffff43c2561 in opal_list_get_first (list=0x7ffff45c5240)
    at ../../../../../opal/class/opal_list.h:201
#1 0x00007ffff43c3137 in mca_common_sm_mmap_init (procs=0x673cb0,
    num_procs=2, size=67113040,
    file_name=0x673c40
"/tmp/openmpi-sessions-srikanth_at_starbuck_0/1510/1/shared_mem_pool.starbuck",
size_ctl_structure=4176, data_seg_alignment=8)
    at ../../../../../ompi/mca/common/sm/common_sm_mmap.c:291
#2 0x00007ffff45c737b in mca_mpool_sm_init (resources=<value optimized
out>)
    at ../../../../../../ompi/mca/mpool/sm/mpool_sm_component.c:214
#3 0x00007ffff7ba638d in mca_mpool_base_module_create ()
   from /usr/lib/libmpi.so.0
#4 0x00007ffff2d2ca38 in sm_btl_first_time_init (btl=<value optimized
out>,
    nprocs=<value optimized out>, procs=<value optimized out>,
    peers=<value optimized out>, reachability=<value optimized out>)
    at ../../../../../../ompi/mca/btl/sm/btl_sm.c:228
#5 mca_btl_sm_add_procs (btl=<value optimized out>,
    nprocs=<value optimized out>, procs=<value optimized out>,
    peers=<value optimized out>, reachability=<value optimized out>)
    at ../../../../../../ompi/mca/btl/sm/btl_sm.c:500
#6 0x00007ffff377e8e7 in mca_bml_r2_add_procs (nprocs=<value optimized
out>,
    procs=0x2, reachable=0x7fffffffdd00)
    at ../../../../../../ompi/mca/bml/r2/bml_r2.c:206
#7 0x00007ffff3b8858c in mca_pml_ob1_add_procs (procs=0x678ce0, nprocs=2)
---Type <return> to continue, or q <return> to quit---
    at ../../../../../../ompi/mca/pml/ob1/pml_ob1.c:315
#8 0x00007ffff7b652bf in ?? () from /usr/lib/libmpi.so.0
#9 0x00007ffff7b86330 in PMPI_Init () from /usr/lib/libmpi.so.0
#10 0x0000000000400866 in main (argc=1, argv=0x7fffffffe008)
    at test.c:4

I can't figure out what's going on here! It says MPI_Init is segfaulting,
but I think it is probably some kind of misconfiguration.
I have tried reinstalling the openmpi package. I have an AMD Turion X2
M500(64 bit) processor.

The interesting thing is, the Segfault occurs only when I try to run
multiple processes. With n = 1, it has no problems.
Thanks for any help!

-- 
Regards,
Srikanth Raju