Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [OMP users]: OpenMP1.4 tuning for sending large messages
From: Pooja Varshneya (pooja.varshneya_at_[hidden])
Date: 2010-04-28 13:31:57


Hi Timur,

Thanks for your response !!
I have applied this patch. But i am on homogeneous environment. The
patch did not help :(

Thanks,
Pooja

On Apr 27, 2010, at 7:29 AM, Timur Magomedov wrote:

> Hello,
> Are you using heterogeneous environment? There was a similar issue
> recently with segfault on mixed x86 and x86_64 environment. Here is
> corresponding thread in ompi-devel:
> http://www.open-mpi.org/community/lists/devel/2010/04/7787.php
> This was fixed in trunk and will likely be fixed in next 1.4 release.
> You can download last trunk snapshot from here
> http://www.open-mpi.org/nightly/trunk/
> and test it.
>
> В Пнд, 26/04/2010 в 15:28 -0400, Pooja Varshneya пишет:
>> Hi All,
>>
>> I am using OpenMPI 1.4 on a cluster of Intel quad-core processors
>> running Linux and connected by ethernet.
>>
>> In an application, i m trying to send and receive large messages of
>> sizes ranging from 1 KB upto 500 MB.
>> The application works fine if the messages sizes are within 1 MB
>> range. When i try to send larger size messages, application crashes
>> with segmentation fault. I have tried to increase the size of btl_tcp
>> send and receive buffer, but it does not seem to be working.
>>
>> Are there any other settings i need to change to enable large
>> messages
>> to be sent ?
>> I am using boost serialization and boost mpi libraries to simplify
>> message packing and unpacking.
>>
>> mpirun -np 3 --mca btl_tcp_eager_limit 536870912 --mca
>> btl_tcp_max_send_size 536870912 --mca
>> btl_tcp_rdma_pipeline_send_length 524288 --mca btl_tcp_sndbuf
>> 536870912 --mca btl_tcp_rcvbuf
>> 536870912 --hostfile hostfile2 --rankfile rankfile2 ./
>> boost_binomial_no_LB
>>
>>
>> [rh5x64-u16:25446] *** Process received signal ***
>> [rh5x64-u16:25446] Signal: Segmentation fault (11)
>> [rh5x64-u16:25446] Signal code: Address not mapped (1)
>> [rh5x64-u16:25446] Failing at address: 0x2b12d14aafdc
>> [rh5x64-u16:25446] [ 0] /lib64/libpthread.so.0 [0x3ba680e7c0]
>> [rh5x64-u16:25446] [ 1] /lib64/libc.so.6(memcpy+0xa0) [0x3ba5c7be50]
>> [rh5x64-u16:25446] [ 2] /usr/local/lib/libmpi.so.0 [0x2b11ccbe0c02]
>> [rh5x64-u16:25446] [ 3] /usr/local/lib/libmpi.so.
>> 0(ompi_convertor_pack
>> +0x160) [0x2b11ccbe4930]
>> [rh5x64-u16:25446] [ 4] /usr/local/lib/openmpi/mca_btl_tcp.so
>> [0x2b11cffcaf67]
>> [rh5x64-u16:25446] [ 5] /usr/local/lib/openmpi/mca_pml_ob1.so
>> [0x2b11cf5af97a]
>> [rh5x64-u16:25446] [ 6] /usr/local/lib/openmpi/mca_pml_ob1.so
>> [0x2b11cf5a9b0d]
>> [rh5x64-u16:25446] [ 7] /usr/local/lib/openmpi/mca_btl_tcp.so
>> [0x2b11cffcd693]
>> [rh5x64-u16:25446] [ 8] /usr/local/lib/libopen-pal.so.0
>> [0x2b11cd0ab95b]
>> [rh5x64-u16:25446] [ 9] /usr/local/lib/libopen-pal.so.0(opal_progress
>> +0x9e) [0x2b11cd0a0b3e]
>> [rh5x64-u16:25446] [10] /usr/local/lib/libmpi.so.0 [0x2b11ccbd62c9]
>> [rh5x64-u16:25446] [11] /usr/local/lib/libmpi.so.0(PMPI_Test+0x73)
>> [0x2b11ccbfc863]
>> [rh5x64-u16:25446] [12] /usr/local/lib/libboost_mpi.so.
>> 1.42.0(_ZN5boost3mpi7request4testEv+0x13d) [0x2b11cc50451d]
>> [rh5x64-u16:25446] [13] ./
>> boost_binomial_no_LB(_ZN5boost3mpi8wait_allIPNS0_7requestEEEvT_S4_
>> +0x19d) [0x42206d]
>> [rh5x64-u16:25446] [14] ./boost_binomial_no_LB [0x41c82a]
>> [rh5x64-u16:25446] [15] ./boost_binomial_no_LB(main+0x169) [0x41d4a9]
>> [rh5x64-u16:25446] [16] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x3ba5c1d994]
>> [rh5x64-u16:25446] [17] ./
>> boost_binomial_no_LB(__gxx_personality_v0+0x371) [0x41a799]
>> [rh5x64-u16:25446] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 25446 on node
>> 172.10.0.116
>> exited on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Kind regards,
> Timur Magomedov
> Senior C++ Developer
> DevelopOnBox LLC / Zodiac Interactive
> http://www.zodiac.tv/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users