Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brian Barrett (bbarrett_at_[hidden])
Date: 2007-08-02 19:07:23


On Aug 2, 2007, at 4:22 PM, Glenn Carver wrote:

> Hopefully an easy question to answer... is it possible to get at the
> values of mca parameters whilst a program is running? What I had in
> mind was either an open-mpi function to call which would print the
> current values of mca parameters or a function to call for specific
> mca parameters. I don't want to interrupt the running of the
> application.
>
> Bit of background. I have a large F90 application running with
> OpenMPI (as Sun Clustertools 7) on Opteron CPUs with an IB network.
> We're seeing swap thrashing occurring on some of the nodes at times
> and having searched the archives and read the FAQ believe we may be
> seeing the problem described in:
> http://www.open-mpi.org/community/lists/users/2007/01/2511.php
> where the udapl free list is growing to a point where lockable
> memory runs out.
>
> Problem is, I have no feel for the kinds of numbers that
> "btl_udapl_free_list_max" might safely get up to? Hence the request
> to print mca parameter values whilst the program is running to see if
> we can tie in high values of this parameter to when we're seeing swap
> thrashing.

Good news, the answer is easy. Bad news is, it's not the one you
want. btl_udapl_free_list_max is the *greatest* the list will ever
be allowed to grow to, not it's current size. So if you don't
specify a value and use the default of -1, it will return -1 for the
life of the application, regardless of how big those free lists
actually get. If you specify value X, it'll return X for the lift of
the application, as well.

There is not a good way for a user to find out the current size of a
free list or the largest it got for the life of an application
(currently those two will always be the same, but that's another
story). Your best bet is to set the parameter to some value (say,
128 or 256) and see if that helps with the swapping.

Brian

-- 
   Brian W. Barrett
   Networking Team, CCS-1
   Los Alamos National Laboratory