Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines
From: Olivier Riff (oliriff_at_[hidden])
Date: 2010-05-20 04:32:23


Hello,

I assume this question has been already discussed many times, but I can not
find on Internet a solution to my problem.
It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous
system (32 bit laptop / 64 bit cluster).
My configuration is :
open mpi 1.4, configured with: --without-openib --enable-heterogeneous
--enable-mpi-threads
Program is launched a laptop (32 bit Mandriva 2008) which distributes tasks
to do to a cluster of 70 processors (64 bit RedHat Entreprise
distribution):
I have to send various buffer size from few bytes till 30Mo.

I tested following commands:
1) mpirun -v -machinefile machinefile.txt MyMPIProgram
-> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer size >
65536.
2) mpirun --mca btl_tcp_eager_limit 30000000 -v -machinefile machinefile.txt
MyMPIProgram
-> works but has the effect of generating gigantic memory consumption on 32
bit machine side after MPI_Recv. Memory consumption goes from 800Mo to 2,1Go
after receiving about 20ko from each 70 clients ( a total of about 1.4 Mo
). This makes my program crash later because I have no more memory to
allocate new structures. I read in a openmpi forum thread that setting
btl_tcp_eager_limit to a huge value explains this huge memory consumption
when a message sent does not have a preposted ready recv. Also after all
messages have been received and there is no more traffic activity : the
memory consumed remains at 2.1go... and I do not understand why.

What is the best way to do in order to have a working program which also has
a small memory consumption (the speed performance can be lower) ?
I tried to play with mca paramters btl_tcp_sndbuf and mca btl_tcp_rcvbuf,
but without success.

Thanks in advance for you help.

Best regards,

Olivier