Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines
From: Olivier Riff (oliriff_at_[hidden])
Date: 2010-05-20 07:26:03


Hello Terry,

Thanks for your answer.

2010/5/20 Terry Dontje <terry.dontje_at_[hidden]>

> Olivier Riff wrote:
>
> Hello,
>
> I assume this question has been already discussed many times, but I can not
> find on Internet a solution to my problem.
> It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous
> system (32 bit laptop / 64 bit cluster).
> My configuration is :
> open mpi 1.4, configured with: --without-openib --enable-heterogeneous
> --enable-mpi-threads
> Program is launched a laptop (32 bit Mandriva 2008) which distributes tasks
> to do to a cluster of 70 processors (64 bit RedHat Entreprise
> distribution):
> I have to send various buffer size from few bytes till 30Mo.
>
> You really want to get your program running without the tcp_eager_limit
> set if you want a better usage of memory. I believe the crash has something
> to do with the rendezvous protocol in OMPI. Have you narrowed this failure
> down to a simple MPI program? Also I noticed that you're configuring with
> --enable-mpi-threads, have you tried configuring without that option?
>
>
-> No, unfortunately I did not narrowed this behaviour to a simple MPI
program. I think I will have to do it if I do not find a solution in the
next days.
I will also make the test without the --enable-mpi-threads configuration.

> I tested following commands:
> 1) mpirun -v -machinefile machinefile.txt MyMPIProgram
> -> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer size
> > 65536.
> 2) mpirun --mca btl_tcp_eager_limit 30000000 -v -machinefile
> machinefile.txt MyMPIProgram
> -> works but has the effect of generating gigantic memory consumption on 32
> bit machine side after MPI_Recv. Memory consumption goes from 800Mo to 2,1Go
> after receiving about 20ko from each 70 clients ( a total of about 1.4 Mo
> ). This makes my program crash later because I have no more memory to
> allocate new structures. I read in a openmpi forum thread that setting
> btl_tcp_eager_limit to a huge value explains this huge memory consumption
> when a message sent does not have a preposted ready recv. Also after all
> messages have been received and there is no more traffic activity : the
> memory consumed remains at 2.1go... and I do not understand why.
>
> Are the 70 clients all on different nodes? I am curious if the 2.1GB is
> due to the SM BTL or possibly a leak in the TCP BTL.
>

No, 70 clients are only on 9 nodes. In fact it is 72 clients: they are nine
8-processor machines.
The 2.1Gb memory consumption appears when I sequentially try to read the
result on each 72 clients (for loop from 1 to 72 calling MPI_Recv). I assume
that many clients have already sent the result whereas the server has not
called the MPI_Rec for the corresponding rank yet.

>
> What is the best way to do in order to have a working program which also
> has a small memory consumption (the speed performance can be lower) ?
> I tried to play with mca paramters btl_tcp_sndbuf and mca btl_tcp_rcvbuf,
> but without success.
>
> Thanks in advance for you help.
>
> Best regards,
>
> Olivier
>
> ------------------------------
>
> _______________________________________________
> users mailing listusers_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> [image: Oracle]
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.650.633.7054
> Oracle * - Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden]
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>