Hi all,
I would like to write a multi-thread parallel program. I used pthread. Basicly, I want to create two background threads besides  the main thread(process). For example, if I use "-np 4", the program should have 4 main processes on four processors and two background threads for each main process. So there should be 8 threads totally. I wrote a test program and it worked unpredictable. Sometimes I got the result I want, but sometimes the program got segmentation fault. I used MPI_Isend and MPI_Irecv for sending and recving. I do not know why? I attached the error message as follow:
 
[cheetah:29780] *** Process received signal ***
[cheetah:29780] Signal: Segmentation fault (11)
[cheetah:29780] Signal code: Address not mapped (1)
[cheetah:29780] Failing at address: 0x10
[cheetah:29779] *** Process received signal ***
[cheetah:29779] Signal: Segmentation fault (11)
[cheetah:29779] Signal code: Address not mapped (1)
[cheetah:29779] Failing at address: 0x10
[cheetah:29780] [ 0] /lib64/libpthread.so.0 [0x334b00de70]
[cheetah:29780] [ 1] /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so [0x2b90e1227940]
[cheetah:29780] [ 2] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b90e05d61ca]
[cheetah:29780] [ 3] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b90e05cac86]
[cheetah:29780] [ 4] /act/openmpi/gnu/lib/libmpi.so.0(PMPI_Send+0x13d) [0x2b90dde7271d]
[cheetah:29780] [ 5] pt_muti(_Z6backIDPv+0x29b) [0x409929]
[cheetah:29780] [ 6] /lib64/libpthread.so.0 [0x334b0062f7]
[cheetah:29780] [ 7] /lib64/libc.so.6(clone+0x6d) [0x334a4d1e3d]
[cheetah:29780] *** End of error message ***
[cheetah:29779] [ 0] /lib64/libpthread.so.0 [0x334b00de70]
[cheetah:29779] [ 1] /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so [0x2b39785c0940]
[cheetah:29779] [ 2] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b397796f1ca]
[cheetah:29779] [ 3] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b3977963c86]
[cheetah:29779] [ 4] /act/openmpi/gnu/lib/libmpi.so.0(PMPI_Send+0x13d) [0x2b397520b71d]
[cheetah:29779] [ 5] pt_muti(_Z6backIDPv+0x29b) [0x409929]
[cheetah:29779] [ 6] /lib64/libpthread.so.0 [0x334b0062f7]
[cheetah:29779] [ 7] /lib64/libc.so.6(clone+0x6d) [0x334a4d1e3d]
[cheetah:29779] *** End of error message ***

 
I used gdb to "bt" the error and I got :
 Program terminated with signal 11, Segmentation fault.
#0  0x00002b90e1227940 in mca_btl_sm_alloc ()
   from /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so
(gdb) bt
#0  0x00002b90e1227940 in mca_btl_sm_alloc ()
   from /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so
#1  0x00002b90e05d61ca in mca_pml_ob1_send_request_start_copy ()
   from /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so
#2  0x00002b90e05cac86 in mca_pml_ob1_send ()
   from /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so
#3  0x00002b90dde7271d in PMPI_Send () from /act/openmpi/gnu/lib/libmpi.so.0
#4  0x0000000000409929 in backID (arg=0x0) at pt_muti.cpp:50
#5  0x000000334b0062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000334a4d1e3d in clone () from /lib64/libc.so.6
So can anyone give me some suggestions or advice. Thanks very much.  


没有互动,哪来共识?微软地图MSN互动为你提供全新地图浏览体验! 立即试用!