Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] [OMP users]: OpenMP1.4 tuning for sending large messages
From: Pooja Varshneya (pooja.varshneya_at_[hidden])
Date: 2010-04-26 15:28:05


Hi All,

I am using OpenMPI 1.4 on a cluster of Intel quad-core processors
running Linux and connected by ethernet.

In an application, i m trying to send and receive large messages of
sizes ranging from 1 KB upto 500 MB.
The application works fine if the messages sizes are within 1 MB
range. When i try to send larger size messages, application crashes
with segmentation fault. I have tried to increase the size of btl_tcp
send and receive buffer, but it does not seem to be working.

Are there any other settings i need to change to enable large messages
to be sent ?
I am using boost serialization and boost mpi libraries to simplify
message packing and unpacking.

mpirun -np 3 --mca btl_tcp_eager_limit 536870912 --mca
btl_tcp_max_send_size 536870912 --mca
btl_tcp_rdma_pipeline_send_length 524288 --mca btl_tcp_sndbuf
536870912 --mca btl_tcp_rcvbuf
536870912 --hostfile hostfile2 --rankfile rankfile2 ./
boost_binomial_no_LB

[rh5x64-u16:25446] *** Process received signal ***
[rh5x64-u16:25446] Signal: Segmentation fault (11)
[rh5x64-u16:25446] Signal code: Address not mapped (1)
[rh5x64-u16:25446] Failing at address: 0x2b12d14aafdc
[rh5x64-u16:25446] [ 0] /lib64/libpthread.so.0 [0x3ba680e7c0]
[rh5x64-u16:25446] [ 1] /lib64/libc.so.6(memcpy+0xa0) [0x3ba5c7be50]
[rh5x64-u16:25446] [ 2] /usr/local/lib/libmpi.so.0 [0x2b11ccbe0c02]
[rh5x64-u16:25446] [ 3] /usr/local/lib/libmpi.so.0(ompi_convertor_pack
+0x160) [0x2b11ccbe4930]
[rh5x64-u16:25446] [ 4] /usr/local/lib/openmpi/mca_btl_tcp.so
[0x2b11cffcaf67]
[rh5x64-u16:25446] [ 5] /usr/local/lib/openmpi/mca_pml_ob1.so
[0x2b11cf5af97a]
[rh5x64-u16:25446] [ 6] /usr/local/lib/openmpi/mca_pml_ob1.so
[0x2b11cf5a9b0d]
[rh5x64-u16:25446] [ 7] /usr/local/lib/openmpi/mca_btl_tcp.so
[0x2b11cffcd693]
[rh5x64-u16:25446] [ 8] /usr/local/lib/libopen-pal.so.0 [0x2b11cd0ab95b]
[rh5x64-u16:25446] [ 9] /usr/local/lib/libopen-pal.so.0(opal_progress
+0x9e) [0x2b11cd0a0b3e]
[rh5x64-u16:25446] [10] /usr/local/lib/libmpi.so.0 [0x2b11ccbd62c9]
[rh5x64-u16:25446] [11] /usr/local/lib/libmpi.so.0(PMPI_Test+0x73)
[0x2b11ccbfc863]
[rh5x64-u16:25446] [12] /usr/local/lib/libboost_mpi.so.
1.42.0(_ZN5boost3mpi7request4testEv+0x13d) [0x2b11cc50451d]
[rh5x64-u16:25446] [13] ./
boost_binomial_no_LB(_ZN5boost3mpi8wait_allIPNS0_7requestEEEvT_S4_
+0x19d) [0x42206d]
[rh5x64-u16:25446] [14] ./boost_binomial_no_LB [0x41c82a]
[rh5x64-u16:25446] [15] ./boost_binomial_no_LB(main+0x169) [0x41d4a9]
[rh5x64-u16:25446] [16] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3ba5c1d994]
[rh5x64-u16:25446] [17] ./
boost_binomial_no_LB(__gxx_personality_v0+0x371) [0x41a799]
[rh5x64-u16:25446] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 25446 on node 172.10.0.116
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------