I have done the test with v1.4.2 and indeed it fixes the problem.
Thank you also Terry for your help. With the fix I do not need anymore to use a huge value of btl_tcp_eager_limit (I keep the default value) which considerably decreases the memory consumption I had before. Everything works fine now.
2010/5/20 Nysal Jan <firstname.lastname@example.org>This probably got fixed in https://svn.open-mpi.org/trac/ompi/ticket/2386
Can you try 1.4.2, the fix should be in there.
I will test it soon (takes some time to install the new version on each node) . It would be perfect if it fixes it.
I will tell you the result asap.
On Thu, May 20, 2010 at 2:02 PM, Olivier Riff <email@example.com> wrote:
I assume this question has been already discussed many times, but I can not find on Internet a solution to my problem.
It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous system (32 bit laptop / 64 bit cluster).
My configuration is :
open mpi 1.4, configured with: --without-openib --enable-heterogeneous --enable-mpi-threads
Program is launched a laptop (32 bit Mandriva 2008) which distributes tasks to do to a cluster of 70 processors (64 bit RedHat Entreprise distribution):
I have to send various buffer size from few bytes till 30Mo.
I tested following commands:
1) mpirun -v -machinefile machinefile.txt MyMPIProgram
-> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer size > 65536.
2) mpirun --mca btl_tcp_eager_limit 30000000 -v -machinefile machinefile.txt MyMPIProgram
-> works but has the effect of generating gigantic memory consumption on 32 bit machine side after MPI_Recv. Memory consumption goes from 800Mo to 2,1Go after receiving about 20ko from each 70 clients ( a total of about 1.4 Mo ). This makes my program crash later because I have no more memory to allocate new structures. I read in a openmpi forum thread that setting btl_tcp_eager_limit to a huge value explains this huge memory consumption when a message sent does not have a preposted ready recv. Also after all messages have been received and there is no more traffic activity : the memory consumed remains at 2.1go... and I do not understand why.
What is the best way to do in order to have a working program which also has a small memory consumption (the speed performance can be lower) ?
I tried to play with mca paramters btl_tcp_sndbuf and mca btl_tcp_rcvbuf, but without success.
Thanks in advance for you help.
users mailing list
users mailing list