Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-05-20 06:30:29


Olivier Riff wrote:
> Hello,
>
> I assume this question has been already discussed many times, but I
> can not find on Internet a solution to my problem.
> It is about buffer size limit of MPI_Send and MPI_Recv with
> heterogeneous system (32 bit laptop / 64 bit cluster).
> My configuration is :
> open mpi 1.4, configured with: --without-openib --enable-heterogeneous
> --enable-mpi-threads
> Program is launched a laptop (32 bit Mandriva 2008) which distributes
> tasks to do to a cluster of 70 processors (64 bit RedHat Entreprise
> distribution):
> I have to send various buffer size from few bytes till 30Mo.
>
You really want to get your program running without the tcp_eager_limit
set if you want a better usage of memory. I believe the crash has
something to do with the rendezvous protocol in OMPI. Have you narrowed
this failure down to a simple MPI program? Also I noticed that you're
configuring with --enable-mpi-threads, have you tried configuring
without that option?
> I tested following commands:
> 1) mpirun -v -machinefile machinefile.txt MyMPIProgram
> -> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer
> size > 65536.
> 2) mpirun --mca btl_tcp_eager_limit 30000000 -v -machinefile
> machinefile.txt MyMPIProgram
> -> works but has the effect of generating gigantic memory consumption
> on 32 bit machine side after MPI_Recv. Memory consumption goes from
> 800Mo to 2,1Go after receiving about 20ko from each 70 clients ( a
> total of about 1.4 Mo ). This makes my program crash later because I
> have no more memory to allocate new structures. I read in a openmpi
> forum thread that setting btl_tcp_eager_limit to a huge value explains
> this huge memory consumption when a message sent does not have a
> preposted ready recv. Also after all messages have been received and
> there is no more traffic activity : the memory consumed remains at
> 2.1go... and I do not understand why.
Are the 70 clients all on different nodes? I am curious if the 2.1GB is
due to the SM BTL or possibly a leak in the TCP BTL.
>
> What is the best way to do in order to have a working program which
> also has a small memory consumption (the speed performance can be lower) ?
> I tried to play with mca paramters btl_tcp_sndbuf and mca
> btl_tcp_rcvbuf, but without success.
>
> Thanks in advance for you help.
>
> Best regards,
>
> Olivier
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture