Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] How to create multi-thread parallel program using thread-safe send and recv?
From: guosong (guosong1079_at_[hidden])
Date: 2009-09-22 00:09:06


Hi all,

I would like to write a multi-thread parallel program. I used pthread. Basicly, I want to create two background threads besides the main thread(process). For example, if I use "-np 4", the program should have 4 main processes on four processors and two background threads for each main process. So there should be 8 threads totally. I wrote a test program and it worked unpredictable. Sometimes I got the result I want, but sometimes the program got segmentation fault. I used MPI_Isend and MPI_Irecv for sending and recving. I do not know why? I attached the error message as follow:

 

[cheetah:29780] *** Process received signal ***
[cheetah:29780] Signal: Segmentation fault (11)
[cheetah:29780] Signal code: Address not mapped (1)
[cheetah:29780] Failing at address: 0x10
[cheetah:29779] *** Process received signal ***
[cheetah:29779] Signal: Segmentation fault (11)
[cheetah:29779] Signal code: Address not mapped (1)
[cheetah:29779] Failing at address: 0x10
[cheetah:29780] [ 0] /lib64/libpthread.so.0 [0x334b00de70]
[cheetah:29780] [ 1] /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so [0x2b90e1227940]
[cheetah:29780] [ 2] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b90e05d61ca]
[cheetah:29780] [ 3] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b90e05cac86]
[cheetah:29780] [ 4] /act/openmpi/gnu/lib/libmpi.so.0(PMPI_Send+0x13d) [0x2b90dde7271d]
[cheetah:29780] [ 5] pt_muti(_Z6backIDPv+0x29b) [0x409929]
[cheetah:29780] [ 6] /lib64/libpthread.so.0 [0x334b0062f7]
[cheetah:29780] [ 7] /lib64/libc.so.6(clone+0x6d) [0x334a4d1e3d]
[cheetah:29780] *** End of error message ***
[cheetah:29779] [ 0] /lib64/libpthread.so.0 [0x334b00de70]
[cheetah:29779] [ 1] /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so [0x2b39785c0940]
[cheetah:29779] [ 2] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b397796f1ca]
[cheetah:29779] [ 3] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b3977963c86]
[cheetah:29779] [ 4] /act/openmpi/gnu/lib/libmpi.so.0(PMPI_Send+0x13d) [0x2b397520b71d]
[cheetah:29779] [ 5] pt_muti(_Z6backIDPv+0x29b) [0x409929]
[cheetah:29779] [ 6] /lib64/libpthread.so.0 [0x334b0062f7]
[cheetah:29779] [ 7] /lib64/libc.so.6(clone+0x6d) [0x334a4d1e3d]
[cheetah:29779] *** End of error message ***

 

I used gdb to "bt" the error and I got :

 Program terminated with signal 11, Segmentation fault.
#0 0x00002b90e1227940 in mca_btl_sm_alloc ()
   from /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so
(gdb) bt
#0 0x00002b90e1227940 in mca_btl_sm_alloc ()
   from /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so
#1 0x00002b90e05d61ca in mca_pml_ob1_send_request_start_copy ()
   from /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so
#2 0x00002b90e05cac86 in mca_pml_ob1_send ()
   from /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so
#3 0x00002b90dde7271d in PMPI_Send () from /act/openmpi/gnu/lib/libmpi.so.0
#4 0x0000000000409929 in backID (arg=0x0) at pt_muti.cpp:50
#5 0x000000334b0062f7 in start_thread () from /lib64/libpthread.so.0
#6 0x000000334a4d1e3d in clone () from /lib64/libc.so.6
So can anyone give me some suggestions or advice. Thanks very much.
                                               
_________________________________________________________________
ÉÏWindows Live ÖйúÊ×Ò³£¬ÏÂÔØ×îаæMessenger£¡
http://www.windowslive.cn