Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines
From: Olivier Riff (oliriff_at_[hidden])
Date: 2010-05-20 12:23:55


I have done the test with v1.4.2 and indeed it fixes the problem.
Thanks Nysal.
Thank you also Terry for your help. With the fix I do not need anymore to
use a huge value of btl_tcp_eager_limit (I keep the default value) which
considerably decreases the memory consumption I had before. Everything works
fine now.

Regards,

Olivier

2010/5/20 Olivier Riff <oliriff_at_[hidden]>

>
>
> 2010/5/20 Nysal Jan <jnysal_at_[hidden]>
>
> This probably got fixed in https://svn.open-mpi.org/trac/ompi/ticket/2386
>> Can you try 1.4.2, the fix should be in there.
>>
>>
>
> I will test it soon (takes some time to install the new version on each
> node) . It would be perfect if it fixes it.
> I will tell you the result asap.
>
> Thanks.
>
> Olivier
>
>
>
>
>
>
>
>> Regards
>> --Nysal
>>
>>
>> On Thu, May 20, 2010 at 2:02 PM, Olivier Riff <oliriff_at_[hidden]>wrote:
>>
>>> Hello,
>>>
>>> I assume this question has been already discussed many times, but I can
>>> not find on Internet a solution to my problem.
>>> It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous
>>> system (32 bit laptop / 64 bit cluster).
>>> My configuration is :
>>> open mpi 1.4, configured with: --without-openib --enable-heterogeneous
>>> --enable-mpi-threads
>>> Program is launched a laptop (32 bit Mandriva 2008) which distributes
>>> tasks to do to a cluster of 70 processors (64 bit RedHat Entreprise
>>> distribution):
>>> I have to send various buffer size from few bytes till 30Mo.
>>>
>>> I tested following commands:
>>> 1) mpirun -v -machinefile machinefile.txt MyMPIProgram
>>> -> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer
>>> size > 65536.
>>> 2) mpirun --mca btl_tcp_eager_limit 30000000 -v -machinefile
>>> machinefile.txt MyMPIProgram
>>> -> works but has the effect of generating gigantic memory consumption on
>>> 32 bit machine side after MPI_Recv. Memory consumption goes from 800Mo to
>>> 2,1Go after receiving about 20ko from each 70 clients ( a total of about 1.4
>>> Mo ). This makes my program crash later because I have no more memory to
>>> allocate new structures. I read in a openmpi forum thread that setting
>>> btl_tcp_eager_limit to a huge value explains this huge memory consumption
>>> when a message sent does not have a preposted ready recv. Also after all
>>> messages have been received and there is no more traffic activity : the
>>> memory consumed remains at 2.1go... and I do not understand why.
>>>
>>> What is the best way to do in order to have a working program which also
>>> has a small memory consumption (the speed performance can be lower) ?
>>> I tried to play with mca paramters btl_tcp_sndbuf and mca btl_tcp_rcvbuf,
>>> but without success.
>>>
>>> Thanks in advance for you help.
>>>
>>> Best regards,
>>>
>>> Olivier
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>