Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to create multi-thread parallel program using thread-safe send and recv?
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-09-22 11:43:11


guosong wrote:
Hi all,
I would like to write a multi-thread parallel program. I used pthread. Basicly, I want to create two background threads besides  the main thread(process). For example, if I use "-np 4", the program should have 4 main processes on four processors and two background threads for each main process. So there should be 8 threads totally.
Wouldn't there be 4 main threads and 8 "slave" threads for a total of 12 threads?  Anyhow, doesn't matter.

I'm not sure where you're starting, but you should at least have a basic understanding of the different sorts of multithreaded programming models in MPI.  One is that each process is single threaded.  Another is the processes are multithreaded, but only the main thread makes MPI calls.  Another is multithreaded, but only one MPI call at a time.  Finally, there can be full multithreading.  You have to decide which of these programming models you want and which is supported by your MPI (or, if OMPI, how OMPI was built).

For more information, try the MPI_Init_thread() man page or
http://www.mpi-forum.org./docs/mpi21-report.pdf ... see Section 12.4 on "MPI and Threads".
I wrote a test program and it worked unpredictable. Sometimes I got the result I want, but sometimes the program got segmentation fault. I used MPI_Isend and MPI_Irecv for sending and recving. I do not know why? I attached the error message as follow:
 
[cheetah:29780] *** Process received signal ***
[cheetah:29780] Signal: Segmentation fault (11)
[cheetah:29780] Signal code: Address not mapped (1)
[cheetah:29780] Failing at address: 0x10
[cheetah:29779] *** Process received signal ***
[cheetah:29779] Signal: Segmentation fault (11)
[cheetah:29779] Signal code: Address not mapped (1)
[cheetah:29779] Failing at address: 0x10
[cheetah:29780] [ 0] /lib64/libpthread.so.0 [0x334b00de70]
[cheetah:29780] [ 1] /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so [0x2b90e1227940]
[cheetah:29780] [ 2] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b90e05d61ca]
[cheetah:29780] [ 3] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b90e05cac86]
[cheetah:29780] [ 4] /act/openmpi/gnu/lib/libmpi.so.0(PMPI_Send+0x13d) [0x2b90dde7271d]
[cheetah:29780] [ 5] pt_muti(_Z6backIDPv+0x29b) [0x409929]
[cheetah:29780] [ 6] /lib64/libpthread.so.0 [0x334b0062f7]
[cheetah:29780] [ 7] /lib64/libc.so.6(clone+0x6d) [0x334! a4d1e3d]
[cheetah:29780] *** End of error message ***
[cheetah:29779] [ 0] /lib64/libpthread.so.0 [0x334b00de70]
[cheetah:29779] [ 1] /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so [0x2b39785c0940]
[cheetah:29779] [ 2] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b397796f1ca]
[cheetah:29779] [ 3] /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so [0x2b3977963c86]
[cheetah:29779] [ 4] /act/openmpi/gnu/lib/libmpi.so.0(PMPI_Send+0x13d) [0x2b397520b71d]
[cheetah:29779] [ 5] pt_muti(_Z6backIDPv+0x29b) [0x409929]
[cheetah:29779] [ 6] /lib64/libpthread.so.0 [0x334b0062f7]
[cheetah:29779] [ 7] /lib64/libc.so.6(clone+0x6d) [0x334a4d1e3d]
[cheetah:29779] *** End of error message ***

 
I used gdb to "bt" the error and I got :
 Program terminated with signal 11, Segmentation fault.
#0  0x00002b90e1227940 in mca_btl_sm_alloc ()
   from /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so
(gdb) bt
#0  0x00002b90e1227940 in mca_btl_sm_alloc ()
   from /act/openmpi/gnu/lib/openmpi/mca_btl_sm.so
#1  0x00002b90e05d61ca in mca_pml_ob1_send_request_start_copy ()
   from /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so
#2  0x00002b90e05cac86 in mca_pml_ob1_send ()
   from /act/openmpi/gnu/lib/openmpi/mca_pml_ob1.so
#3  0x00002b90dde7271d in PMPI_Send () from /act/openmpi/gnu/lib/libmpi.so.0
#4  0x0000000000409929 in backID (arg=0x0) at pt_muti.cpp:50
#5  0x000000334b0062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000334a4d1e3d in clone () from /lib64/libc.so.6
So can anyone give me some suggestions or advice. Thanks very much.