Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2007-08-06 20:39:19


Guess I don't see how stale shared memory files would cause swapping to
occur. Besides, the user provided no indication that the applications were
abnormally terminating, which makes it likely we cleaned up the session
directories as we should.

However, we definitely leak memory (i.e., we don't free all memory we malloc
while supporting execution of an application), so if the OS isn't cleaning
up after us, it is quite possible we could be causing the problem as
described. It would appear exactly as described - a slow leak that gradually
builds up until the "dead" area was so big that it forces applications to
swap to find enough room to work.

So I guess we should ask for clarification:

1. are the Open MPI applications exiting cleanly? Do you see any stale
"orted" executables still in the process table?

2. can you check the temp directory where we would be operating? This is
usually your /tmp directory, unless you specified some other location. Look
for our session directories - they have a name that includes "openmpi" in
them. Are they being cleaned up (i.e., removed) when the applications exit?

Thanks
Ralph

On 8/6/07 5:53 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

> Unless there's something weird going on in the Solaris kernel, the
> only memory that we should be leaking after MPI processes exit would
> be shared memory files that are [somehow] not getting removed properly.
>
> Right?
>
>
> On Aug 6, 2007, at 8:15 AM, Ralph H Castain wrote:
>
>> Hmmm...just to clarify as I think there may be some confusion here.
>>
>> Orte-clean will kill any outstanding Open MPI daemons (which should
>> kill
>> their local apps) and will cleanup their associated temporary file
>> systems.
>> If you are having problems with zombied processes or stale daemons,
>> then
>> this will hopefully help (it isn't perfect, but it helps).
>>
>> However, orte-clean will not do anything about releasing memory
>> that has
>> been "leaked" by Open MPI. We don't have any tools for doing that, I'm
>> afraid.
>>
>>
>> On 8/6/07 8:08 AM, "Don Kerr" <Don.Kerr_at_[hidden]> wrote:
>>
>>> Glenn,
>>>
>>> With CT7 there is a utility which can be used to clean up left over
>>> cruft from stale MPI processes.
>>>
>>> % man -M /opt/SUNWhpc/man -s 1 orte-clean
>>>
>>> Achtung: This will remove current running jobs as well. Use of "-
>>> v" for
>>> verbose recommended.
>>>
>>> I would be curious if this helps.
>>>
>>> -DON
>>> p.s. orte-clean does not exist in the ompi v1.2 branch, it is in the
>>> trunk but I think there is an issue with it currently
>>>
>>> Ralph H Castain wrote:
>>>
>>>>
>>>> On 8/5/07 6:35 PM, "Glenn Carver" <Glenn.Carver_at_[hidden]>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> I'd appreciate some advice and help on this one. We're having
>>>>> serious problems running parallel applications on our cluster.
>>>>> After
>>>>> each batch job finishes, we lose a certain amount of available
>>>>> memory. Additional jobs cause free memory to gradually go down
>>>>> until
>>>>> the machine starts swapping and becomes unusable or hangs.
>>>>> Taking the
>>>>> machine to single user mode doesn't restore the memory, only a
>>>>> reboot
>>>>> returns all available memory. This happens on all our nodes.
>>>>>
>>>>> We've been doing some testing to try to pin the problems down,
>>>>> although we still don't fully know where the problem is coming
>>>>> from.
>>>>> We have ruled out our applications (fortran codes); we see the same
>>>>> behaviour with Intel's IMB. We know it's not a network issue as a
>>>>> parallel job running solely on the 4 cores on each node produces
>>>>> the
>>>>> same effect. All nodes have been brought up to the very latest OS
>>>>> patches and we still see the same problem.
>>>>>
>>>>> Details: we're running Solaris 10/06, Sun Studio 12, Clustertools 7
>>>>> (open-mpi 1.2.1) and Sun Gridengine 6.1. Hardware is Sun X4100/
>>>>> X4200.
>>>>> Kernel version: SunOS 5.10 Generic_125101-10 on all nodes.
>>>>>
>>>>> I read in the release notes that a number of memory leaks were
>>>>> fixed
>>>>> for the 1.2.1 release but none have been noticed since so I'm not
>>>>> sure where the problem might be.
>>>>>
>>>>>
>>>>
>>>> I'm not sure where that claim came from, but it is certainly not
>>>> true that
>>>> we haven't noticed any leaks since 1.2.1. We know we have quite a
>>>> few memory
>>>> leaks in the code base, many of which are small in themselves but
>>>> can add up
>>>> depending upon exactly what the application does (i.e., which
>>>> code paths it
>>>> travels). Running a simple hello_world app under valgrind will show
>>>> significant unreleased memory.
>>>>
>>>> I doubt you will see much, if any, improvement in 1.2.4. There
>>>> have probably
>>>> been a few patches applied, but a comprehensive effort to
>>>> eradicate the
>>>> problem has not been made. It is something we are trying to
>>>> cleanup over
>>>> time, but hasn't been a crash priority as most OS's do a fairly
>>>> good job of
>>>> cleaning up when the app completes.
>>>>
>>>>
>>>>
>>>>> My next move is to try the very latest release (probably
>>>>> 1.2.4pre-release). As CT7 is built with sun studio 11 rather
>>>>> than 12
>>>>> which we're using, I might also try downgrading. At the moment
>>>>> we're
>>>>> rebooting our cluster nodes every day to keep things going. So any
>>>>> suggestions are appreciated.
>>>>>
>>>>> Thanks, Glenn
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> $ ompi_info
>>>>> Open MPI: 1.2.1r14096-ct7b030r1838
>>>>> Open MPI SVN revision: 0
>>>>> Open RTE: 1.2.1r14096-ct7b030r1838
>>>>> Open RTE SVN revision: 0
>>>>> OPAL: 1.2.1r14096-ct7b030r1838
>>>>> OPAL SVN revision: 0
>>>>> Prefix: /opt/SUNWhpc/HPC7.0
>>>>> Configured architecture: i386-pc-solaris2.10
>>>>> Configured by: root
>>>>> Configured on: Fri Mar 30 13:40:12 EDT 2007
>>>>> Configure host: burpen-csx10-0
>>>>> Built by: root
>>>>> Built on: Fri Mar 30 13:57:25 EDT 2007
>>>>> Built host: burpen-csx10-0
>>>>> C bindings: yes
>>>>> C++ bindings: yes
>>>>> Fortran77 bindings: yes (all)
>>>>> Fortran90 bindings: yes
>>>>> Fortran90 bindings size: trivial
>>>>> C compiler: cc
>>>>> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>>>>> C++ compiler: CC
>>>>> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>>>>> Fortran77 compiler: f77
>>>>> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
>>>>> Fortran90 compiler: f95
>>>>> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>>>>> C profiling: yes
>>>>> C++ profiling: yes
>>>>> Fortran77 profiling: yes
>>>>> Fortran90 profiling: yes
>>>>> C++ exceptions: yes
>>>>> Thread support: no
>>>>> Internal debug support: no
>>>>> MPI parameter check: runtime
>>>>> Memory profiling support: no
>>>>> Memory debugging support: no
>>>>> libltdl support: yes
>>>>> Heterogeneous support: yes
>>>>> mpirun default --prefix: yes
>>>>> MCA backtrace: printstack (MCA v1.0, API v1.0,
>>>>> Component v1.2.1)
>>>>> MCA paffinity: solaris (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA maffinity: first_use (MCA v1.0, API v1.0,
>>>>> Component v1.2.1)
>>>>> MCA timer: solaris (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA allocator: basic (MCA v1.0, API v1.0, Component
>>>>> v1.0)
>>>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component
>>>>> v1.0)
>>>>> MCA coll: basic (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA coll: self (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
>>>>> MCA coll: tuned (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA io: romio (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
>>>>> MCA mpool: udapl (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
>>>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
>>>>> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
>>>>> MCA rcache: vma (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA btl: self (MCA v1.0, API v1.0.1, Component
>>>>> v1.2.1)
>>>>> MCA btl: sm (MCA v1.0, API v1.0.1, Component
>>>>> v1.2.1)
>>>>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component
>>>>> v1.0)
>>>>> MCA btl: udapl (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA topo: unity (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component
>>>>> v1.2.1)
>>>>> MCA errmgr: orted (MCA v1.0, API v1.3, Component
>>>>> v1.2.1)
>>>>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component
>>>>> v1.2.1)
>>>>> MCA gpr: null (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA gpr: replica (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA iof: proxy (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA iof: svc (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA ns: proxy (MCA v1.0, API v2.0, Component
>>>>> v1.2.1)
>>>>> MCA ns: replica (MCA v1.0, API v2.0, Component
>>>>> v1.2.1)
>>>>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>>>>> MCA ras: dash_host (MCA v1.0, API v1.3,
>>>>> Component v1.2.1)
>>>>> MCA ras: gridengine (MCA v1.0, API v1.3,
>>>>> Component v1.2.1)
>>>>> MCA ras: localhost (MCA v1.0, API v1.3,
>>>>> Component v1.2.1)
>>>>> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
>>>>> MCA rds: hostfile (MCA v1.0, API v1.3,
>>>>> Component v1.2.1)
>>>>> MCA rds: proxy (MCA v1.0, API v1.3, Component
>>>>> v1.2.1)
>>>>> MCA rds: resfile (MCA v1.0, API v1.3, Component
>>>>> v1.2.1)
>>>>> MCA rmaps: round_robin (MCA v1.0, API v1.3,
>>>>> Component v1.2.1)
>>>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component
>>>>> v1.2.1)
>>>>> MCA rmgr: urm (MCA v1.0, API v2.0, Component
>>>>> v1.2.1)
>>>>> MCA rml: oob (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA pls: gridengine (MCA v1.0, API v1.3,
>>>>> Component v1.2.1)
>>>>> MCA pls: proxy (MCA v1.0, API v1.3, Component
>>>>> v1.2.1)
>>>>> MCA pls: rsh (MCA v1.0, API v1.3, Component
>>>>> v1.2.1)
>>>>> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
>>>>> MCA sds: env (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA sds: pipe (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA sds: seed (MCA v1.0, API v1.0, Component
>>>>> v1.2.1)
>>>>> MCA sds: singleton (MCA v1.0, API v1.0,
>>>>> Component v1.2.1)
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>