Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Could following situations caused by RDMA mcaparameters?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-04-22 11:00:12


On Apr 21, 2009, at 11:01 AM, Tsung Han Shie wrote:

> I tried to increase speed of a program with openmpi-1.1.3

Did you mean 1.1.3 or 1.3.1?

> by adding following 4 parameters into openmpi-mca-params.conf file.
>
> mpi_leave_pinned=1
> btl_openib_eager_rdma_num=128
> btl_openib_max_eager_rdma=128
> btl_openib_eager_limit=1024

If you meant 1.3.1 above, please see the following message about an
important bug in 1.3 and 1.3.1 with the use of mpi_leave_pinned:

     http://www.open-mpi.org/community/lists/announce/2009/03/0029.php

> and then, I ran my program twice(124 processes on 31 nodes). one
> with "mpi_leave_pinned=1", another with "mpi_leave_pinned=0".
> All of them were stopped abnormally with "ctrl+c" and "killall -9
> <program>".

Why -- did they hang?

> After that, I couldn't start to run that program again.

What exactly was the error?

> I checked every nodes with "free -m" and I found that huge amount of
> cached memory were used in each nodes.
> Could this situation be caused by those 4 parameters? IS there
> anyway to free theme?

Probably not.

Can you send all the information listed here:

     http://www.open-mpi.org/community/help/

-- 
Jeff Squyres
Cisco Systems