Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Don Kerr (Don.Kerr_at_[hidden])
Date: 2007-08-07 08:59:30


I will run some tests to check out this possibility.

-DON

Jeff Squyres wrote:

>I guess this is a question for Sun: what happens if registered memory
>is not freed after a process exits? Does the kernel leave it allocated?
>
>
>On Aug 6, 2007, at 7:00 PM, Glenn Carver wrote:
>
>
>
>>Just to clarify, the MPI applications exit cleanly. We have our own
>>f90 code (in various configurations) and I'm also testing using
>>Intel's IMB. If I watch the applications whilst they run, there is a
>>drop in free memory as our code begins, the free memory then steadily
>>drops as the code runs. When it exits normally, free memory increases
>>but falls short of where it was before the code started. The longer
>>we run the code for the bigger the final drop in memory. Taking the
>>machine down to single user mode doesn't help so it's not an issue of
>>processes still running. Neither can I find any files still open with
>>lsof.
>>
>>We installed Sun's clustertools 6 (not based on openmpi) and we don't
>>see the same problem. I'm currently testing whether setting
>>btl_udapl_flags=1 makes a difference. I'm guessing that registered
>>memory is leaking? We're also trying some mca parameters to turn off
>>features we don't need to see if that makes a difference. I'll
>>report back on point 2. below and further tests later. If there's
>>specific mca parameters you'd like to see specified let me know?
>>
>>Thanks, Glenn
>>
>>
>>
>>
>>>Guess I don't see how stale shared memory files would cause
>>>swapping to
>>>occur. Besides, the user provided no indication that the
>>>applications were
>>>abnormally terminating, which makes it likely we cleaned up the
>>>session
>>>directories as we should.
>>>
>>>However, we definitely leak memory (i.e., we don't free all memory
>>>we malloc
>>>while supporting execution of an application), so if the OS isn't
>>>cleaning
>>>up after us, it is quite possible we could be causing the problem as
>>>described. It would appear exactly as described - a slow leak that
>>>gradually
>>>builds up until the "dead" area was so big that it forces
>>>applications to
>>>swap to find enough room to work.
>>>
>>>So I guess we should ask for clarification:
>>>
>>>1. are the Open MPI applications exiting cleanly? Do you see any
>>>stale
>>>"orted" executables still in the process table?
>>>
>>>2. can you check the temp directory where we would be operating?
>>>This is
>>>usually your /tmp directory, unless you specified some other
>>>location. Look
>>>for our session directories - they have a name that includes
>>>"openmpi" in
>>>them. Are they being cleaned up (i.e., removed) when the
>>>applications exit?
>>>
>>>Thanks
>>>Ralph
>>>
>>>
>>>On 8/6/07 5:53 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>>
>>>
>>>
>>>> Unless there's something weird going on in the Solaris kernel, the
>>>> only memory that we should be leaking after MPI processes exit
>>>>would
>>>> be shared memory files that are [somehow] not getting removed
>>>>properly.
>>>>
>>>> Right?
>>>>
>>>>
>>>> On Aug 6, 2007, at 8:15 AM, Ralph H Castain wrote:
>>>>
>>>>
>>>>
>>>>> Hmmm...just to clarify as I think there may be some confusion
>>>>>here.
>>>>>
>>>>> Orte-clean will kill any outstanding Open MPI daemons (which
>>>>>should
>>>>> kill
>>>>> their local apps) and will cleanup their associated temporary file
>>>>> systems.
>>>>> If you are having problems with zombied processes or stale
>>>>>daemons,
>>>>> then
>>>>> this will hopefully help (it isn't perfect, but it helps).
>>>>>
>>>>> However, orte-clean will not do anything about releasing memory
>>>>> that has
>>>>> been "leaked" by Open MPI. We don't have any tools for doing
>>>>>that, I'm
>>>>> afraid.
>>>>>
>>>>>
>>>>> On 8/6/07 8:08 AM, "Don Kerr" <Don.Kerr_at_[hidden]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Glenn,
>>>>>>
>>>>>> With CT7 there is a utility which can be used to clean up left
>>>>>>over
>>>>>> cruft from stale MPI processes.
>>>>>>
>>>>>> % man -M /opt/SUNWhpc/man -s 1 orte-clean
>>>>>>
>>>>>> Achtung: This will remove current running jobs as well. Use of "-
>>>>>> v" for
>>>>>> verbose recommended.
>>>>>>
>>>>>> I would be curious if this helps.
>>>>>>
>>>>>> -DON
>>>>>> p.s. orte-clean does not exist in the ompi v1.2 branch, it is
>>>>>>in the
>>>>>> trunk but I think there is an issue with it currently
>>>>>>
>>>>>> Ralph H Castain wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 8/5/07 6:35 PM, "Glenn Carver"
>>>>>>><Glenn.Carver_at_[hidden]>
>>>>>>>wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I'd appreciate some advice and help on this one. We're having
>>>>>>>> serious problems running parallel applications on our cluster.
>>>>>>>>After
>>>>>>>> each batch job finishes, we lose a certain amount of available
>>>>>>>> memory. Additional jobs cause free memory to gradually go down
>>>>>>>> until
>>>>>>>> the machine starts swapping and becomes unusable or hangs.
>>>>>>>> Taking the
>>>>>>>> machine to single user mode doesn't restore the memory, only a
>>>>>>>> reboot
>>>>>>>> returns all available memory. This happens on all our nodes.
>>>>>>>>
>>>>>>>> We've been doing some testing to try to pin the problems down,
>>>>>>>> although we still don't fully know where the problem is coming
>>>>>>>> from.
>>>>>>>> We have ruled out our applications (fortran codes); we see
>>>>>>>>the same
>>>>>>>> behaviour with Intel's IMB. We know it's not a network
>>>>>>>>issue as a
>>>>>>>> parallel job running solely on the 4 cores on each node
>>>>>>>>produces
>>>>>>>> the
>>>>>>>> same effect. All nodes have been brought up to the very
>>>>>>>>latest OS
>>>>>>>> patches and we still see the same problem.
>>>>>>>>
>>>>>>>> Details: we're running Solaris 10/06, Sun Studio 12,
>>>>>>>>Clustertools 7
>>>>>>>> (open-mpi 1.2.1) and Sun Gridengine 6.1. Hardware is Sun X4100/
>>>>>>>> X4200.
>>>>>>>> Kernel version: SunOS 5.10 Generic_125101-10 on all nodes.
>>>>>>>>
>>>>>>>> I read in the release notes that a number of memory leaks were
>>>>>>>> fixed
>>>>>>>> for the 1.2.1 release but none have been noticed since so
>>>>>>>>I'm not
>>>>>>>> sure where the problem might be.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I'm not sure where that claim came from, but it is certainly not
>>>>>>> true that
>>>>>>> we haven't noticed any leaks since 1.2.1. We know we have
>>>>>>>quite a
>>>>>>> few memory
>>>>>>> leaks in the code base, many of which are small in themselves
>>>>>>>but
>>>>>>> can add up
>>>>>>> depending upon exactly what the application does (i.e., which
>>>>>>> code paths it
>>>>>>> travels). Running a simple hello_world app under valgrind
>>>>>>>will show
>>>>>>> significant unreleased memory.
>>>>>>>
>>>>>>> I doubt you will see much, if any, improvement in 1.2.4. There
>>>>>>> have probably
>>>>>>> been a few patches applied, but a comprehensive effort to
>>>>>>> eradicate the
>>>>>>> problem has not been made. It is something we are trying to
>>>>>>> cleanup over
>>>>>>> time, but hasn't been a crash priority as most OS's do a fairly
>>>>>>> good job of
>>>>>>> cleaning up when the app completes.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> My next move is to try the very latest release (probably
>>>>>>>> 1.2.4pre-release). As CT7 is built with sun studio 11 rather
>>>>>>>> than 12
>>>>>>>> which we're using, I might also try downgrading. At the moment
>>>>>>>> we're
>>>>>>>> rebooting our cluster nodes every day to keep things going.
>>>>>>>>So any
>>>>>>>> suggestions are appreciated.
>>>>>>>>
>>>>>>>> Thanks, Glenn
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> $ ompi_info
>>>>>>>> Open MPI: 1.2.1r14096-ct7b030r1838
>>>>>>>> Open MPI SVN revision: 0
>>>>>>>> Open RTE: 1.2.1r14096-ct7b030r1838
>>>>>>>> Open RTE SVN revision: 0
>>>>>>>> OPAL: 1.2.1r14096-ct7b030r1838
>>>>>>>> OPAL SVN revision: 0
>>>>>>>> Prefix: /opt/SUNWhpc/HPC7.0
>>>>>>>> Configured architecture: i386-pc-solaris2.10
>>>>>>>> Configured by: root
>>>>>>>> Configured on: Fri Mar 30 13:40:12 EDT 2007
>>>>>>>> Configure host: burpen-csx10-0
>>>>>>>> Built by: root
>>>>>>>> Built on: Fri Mar 30 13:57:25 EDT 2007
>>>>>>>> Built host: burpen-csx10-0
>>>>>>>> C bindings: yes
>>>>>>>> C++ bindings: yes
>>>>>>>> Fortran77 bindings: yes (all)
>>>>>>>> Fortran90 bindings: yes
>>>>>>>> Fortran90 bindings size: trivial
>>>>>>>> C compiler: cc
>>>>>>>> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>>>>>>>> C++ compiler: CC
>>>>>>>> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>>>>>>>> Fortran77 compiler: f77
>>>>>>>> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
>>>>>>>> Fortran90 compiler: f95
>>>>>>>> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>>>>>>>> C profiling: yes
>>>>>>>> C++ profiling: yes
>>>>>>>> Fortran77 profiling: yes
>>>>>>>> Fortran90 profiling: yes
>>>>>>>> C++ exceptions: yes
>>>>>>>> Thread support: no
>>>>>>>> Internal debug support: no
>>>>>>>> MPI parameter check: runtime
>>>>>>>> Memory profiling support: no
>>>>>>>> Memory debugging support: no
>>>>>>>> libltdl support: yes
>>>>>>>> Heterogeneous support: yes
>>>>>>>> mpirun default --prefix: yes
>>>>>>>> MCA backtrace: printstack (MCA v1.0, API v1.0,
>>>>>>>> Component v1.2.1)
>>>>>>>> MCA paffinity: solaris (MCA v1.0, API v1.0,
>>>>>>>>Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA maffinity: first_use (MCA v1.0, API v1.0,
>>>>>>>> Component v1.2.1)
>>>>>>>> MCA timer: solaris (MCA v1.0, API v1.0,
>>>>>>>>Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA allocator: basic (MCA v1.0, API v1.0, Component
>>>>>>>> v1.0)
>>>>>>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component
>>>>>>>> v1.0)
>>>>>>>> MCA coll: basic (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA coll: self (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA coll: sm (MCA v1.0, API v1.0, Component
>>>>>>>>v1.2.1)
>>>>>>>> MCA coll: tuned (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA io: romio (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA mpool: sm (MCA v1.0, API v1.0, Component
>>>>>>>>v1.2.1)
>>>>>>>> MCA mpool: udapl (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA pml: cm (MCA v1.0, API v1.0, Component
>>>>>>>>v1.2.1)
>>>>>>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA bml: r2 (MCA v1.0, API v1.0, Component
>>>>>>>>v1.2.1)
>>>>>>>> MCA rcache: rb (MCA v1.0, API v1.0, Component
>>>>>>>>v1.2.1)
>>>>>>>> MCA rcache: vma (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA btl: self (MCA v1.0, API v1.0.1, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA btl: sm (MCA v1.0, API v1.0.1, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component
>>>>>>>> v1.0)
>>>>>>>> MCA btl: udapl (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA topo: unity (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA errmgr: orted (MCA v1.0, API v1.3, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA gpr: null (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA gpr: replica (MCA v1.0, API v1.0,
>>>>>>>>Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA iof: proxy (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA iof: svc (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA ns: proxy (MCA v1.0, API v2.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA ns: replica (MCA v1.0, API v2.0,
>>>>>>>>Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA oob: tcp (MCA v1.0, API v1.0, Component
>>>>>>>>v1.0)
>>>>>>>> MCA ras: dash_host (MCA v1.0, API v1.3,
>>>>>>>> Component v1.2.1)
>>>>>>>> MCA ras: gridengine (MCA v1.0, API v1.3,
>>>>>>>> Component v1.2.1)
>>>>>>>> MCA ras: localhost (MCA v1.0, API v1.3,
>>>>>>>> Component v1.2.1)
>>>>>>>> MCA ras: tm (MCA v1.0, API v1.3, Component
>>>>>>>>v1.2.1)
>>>>>>>> MCA rds: hostfile (MCA v1.0, API v1.3,
>>>>>>>> Component v1.2.1)
>>>>>>>> MCA rds: proxy (MCA v1.0, API v1.3, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA rds: resfile (MCA v1.0, API v1.3,
>>>>>>>>Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA rmaps: round_robin (MCA v1.0, API v1.3,
>>>>>>>> Component v1.2.1)
>>>>>>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA rmgr: urm (MCA v1.0, API v2.0, Component
>>>>>>>>v1.2.1)
>>>>>>>> MCA rml: oob (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA pls: gridengine (MCA v1.0, API v1.3,
>>>>>>>> Component v1.2.1)
>>>>>>>> MCA pls: proxy (MCA v1.0, API v1.3, Component
>>>>>>>>v1.2.1)
>>>>>>>> MCA pls: rsh (MCA v1.0, API v1.3, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA pls: tm (MCA v1.0, API v1.3, Component
>>>>>>>>v1.2.1)
>>>>>>>> MCA sds: env (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA sds: pipe (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA sds: seed (MCA v1.0, API v1.0, Component
>>>>>>>> v1.2.1)
>>>>>>>> MCA sds: singleton (MCA v1.0, API v1.0,
>>>>>>>> Component v1.2.1)
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>_______________________________________________
>>>users mailing list
>>>users_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>
>
>