Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Don Kerr (Don.Kerr_at_[hidden])
Date: 2007-08-07 09:04:50


Glenn,

While I look into the possibility of registered memory not being freed
could you run your same tests but without shared memory or udapl.

"--mca btl self,tcp"

If this is successful, i.e. frees memory as expected. The next step
would be to run including shared memory, "--mca btl self,sm,tcp". If
this is successful the last step would be to add in udapl, "--mca btl
self,sm,udapl".

-DON

Glenn Carver wrote:

>Just to clarify, the MPI applications exit cleanly. We have our own
>f90 code (in various configurations) and I'm also testing using
>Intel's IMB. If I watch the applications whilst they run, there is a
>drop in free memory as our code begins, the free memory then steadily
>drops as the code runs. When it exits normally, free memory increases
>but falls short of where it was before the code started. The longer
>we run the code for the bigger the final drop in memory. Taking the
>machine down to single user mode doesn't help so it's not an issue of
>processes still running. Neither can I find any files still open with
>lsof.
>
>We installed Sun's clustertools 6 (not based on openmpi) and we don't
>see the same problem. I'm currently testing whether setting
>btl_udapl_flags=1 makes a difference. I'm guessing that registered
>memory is leaking? We're also trying some mca parameters to turn off
>features we don't need to see if that makes a difference. I'll
>report back on point 2. below and further tests later. If there's
>specific mca parameters you'd like to see specified let me know?
>
>Thanks, Glenn
>
>
>
>
>>Guess I don't see how stale shared memory files would cause swapping to
>>occur. Besides, the user provided no indication that the applications were
>>abnormally terminating, which makes it likely we cleaned up the session
>>directories as we should.
>>
>>However, we definitely leak memory (i.e., we don't free all memory we malloc
>>while supporting execution of an application), so if the OS isn't cleaning
>>up after us, it is quite possible we could be causing the problem as
>>described. It would appear exactly as described - a slow leak that gradually
>>builds up until the "dead" area was so big that it forces applications to
>>swap to find enough room to work.
>>
>>So I guess we should ask for clarification:
>>
>>1. are the Open MPI applications exiting cleanly? Do you see any stale
>>"orted" executables still in the process table?
>>
>>2. can you check the temp directory where we would be operating? This is
>>usually your /tmp directory, unless you specified some other location. Look
>>for our session directories - they have a name that includes "openmpi" in
>>them. Are they being cleaned up (i.e., removed) when the applications exit?
>>
>>Thanks
>>Ralph
>>
>>
>>On 8/6/07 5:53 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>
>>
>>
>>> Unless there's something weird going on in the Solaris kernel, the
>>> only memory that we should be leaking after MPI processes exit would
>>> be shared memory files that are [somehow] not getting removed properly.
>>>
>>> Right?
>>>
>>>
>>> On Aug 6, 2007, at 8:15 AM, Ralph H Castain wrote:
>>>
>>>
>>>
>>>> Hmmm...just to clarify as I think there may be some confusion here.
>>>>
>>>> Orte-clean will kill any outstanding Open MPI daemons (which should
>>>> kill
>>>> their local apps) and will cleanup their associated temporary file
>>>> systems.
>>>> If you are having problems with zombied processes or stale daemons,
>>>> then
>>>> this will hopefully help (it isn't perfect, but it helps).
>>>>
>>>> However, orte-clean will not do anything about releasing memory
>>>> that has
>>>> been "leaked" by Open MPI. We don't have any tools for doing that, I'm
>>>> afraid.
>>>>
>>>>
>>>> On 8/6/07 8:08 AM, "Don Kerr" <Don.Kerr_at_[hidden]> wrote:
>>>>
>>>>
>>>>
>>>>> Glenn,
>>>>>
>>>>> With CT7 there is a utility which can be used to clean up left over
>>>>> cruft from stale MPI processes.
>>>>>
>>>>> % man -M /opt/SUNWhpc/man -s 1 orte-clean
>>>>>
>>>>> Achtung: This will remove current running jobs as well. Use of "-
>>>>> v" for
>>>>> verbose recommended.
>>>>>
>>>>> I would be curious if this helps.
>>>>>
>>>>> -DON
>>>>> p.s. orte-clean does not exist in the ompi v1.2 branch, it is in the
>>>>> trunk but I think there is an issue with it currently
>>>>>
>>>>> Ralph H Castain wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On 8/5/07 6:35 PM, "Glenn Carver" <Glenn.Carver_at_[hidden]>
>>>>>>
>>>>>>
>> >>>> wrote:
>>
>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I'd appreciate some advice and help on this one. We're having
>>>>>>> serious problems running parallel applications on our cluster.
>>>>>>>
>>>>>>>
>> >>>>> After
>>
>>
>>>>>>> each batch job finishes, we lose a certain amount of available
>>>>>>> memory. Additional jobs cause free memory to gradually go down
>>>>>>> until
>>>>>>> the machine starts swapping and becomes unusable or hangs.
>>>>>>> Taking the
>>>>>>> machine to single user mode doesn't restore the memory, only a
>>>>>>> reboot
>>>>>>> returns all available memory. This happens on all our nodes.
>>>>>>>
>>>>>>> We've been doing some testing to try to pin the problems down,
>>>>>>> although we still don't fully know where the problem is coming
>>>>>>> from.
>>>>>>> We have ruled out our applications (fortran codes); we see the same
>>>>>>> behaviour with Intel's IMB. We know it's not a network issue as a
>>>>>>> parallel job running solely on the 4 cores on each node produces
>>>>>>> the
>>>>>>> same effect. All nodes have been brought up to the very latest OS
>>>>>>> patches and we still see the same problem.
>>>>>>>
>>>>>>> Details: we're running Solaris 10/06, Sun Studio 12, Clustertools 7
>>>>>>> (open-mpi 1.2.1) and Sun Gridengine 6.1. Hardware is Sun X4100/
>>>>>>> X4200.
>>>>>>> Kernel version: SunOS 5.10 Generic_125101-10 on all nodes.
>>>>>>>
>>>>>>> I read in the release notes that a number of memory leaks were
>>>>>>> fixed
>>>>>>> for the 1.2.1 release but none have been noticed since so I'm not
>>>>>>> sure where the problem might be.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I'm not sure where that claim came from, but it is certainly not
>>>>>> true that
>>>>>> we haven't noticed any leaks since 1.2.1. We know we have quite a
>>>>>> few memory
>>>>>> leaks in the code base, many of which are small in themselves but
>>>>>> can add up
>>>>>> depending upon exactly what the application does (i.e., which
>>>>>> code paths it
>>>>>> travels). Running a simple hello_world app under valgrind will show
>>>>>> significant unreleased memory.
>>>>>>
>>>>>> I doubt you will see much, if any, improvement in 1.2.4. There
>>>>>> have probably
>>>>>> been a few patches applied, but a comprehensive effort to
>>>>>> eradicate the
>>>>>> problem has not been made. It is something we are trying to
>>>>>> cleanup over
>>>>>> time, but hasn't been a crash priority as most OS's do a fairly
>>>>>> good job of
>>>>>> cleaning up when the app completes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My next move is to try the very latest release (probably
>>>>>>> 1.2.4pre-release). As CT7 is built with sun studio 11 rather
>>>>>>> than 12
>>>>>>> which we're using, I might also try downgrading. At the moment
>>>>>>> we're
>>>>>>> rebooting our cluster nodes every day to keep things going. So any
>>>>>>> suggestions are appreciated.
>>>>>>>
>>>>>>> Thanks, Glenn
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> $ ompi_info
>>>>>>> Open MPI: 1.2.1r14096-ct7b030r1838
>>>>>>> Open MPI SVN revision: 0
>>>>>>> Open RTE: 1.2.1r14096-ct7b030r1838
>>>>>>> Open RTE SVN revision: 0
>>>>>>> OPAL: 1.2.1r14096-ct7b030r1838
>>>>>>> OPAL SVN revision: 0
>>>>>>> Prefix: /opt/SUNWhpc/HPC7.0
>>>>>>> Configured architecture: i386-pc-solaris2.10
>>>>>>> Configured by: root
>>>>>>> Configured on: Fri Mar 30 13:40:12 EDT 2007
>>>>>>> Configure host: burpen-csx10-0
>>>>>>> Built by: root
>>>>>>> Built on: Fri Mar 30 13:57:25 EDT 2007
>>>>>>> Built host: burpen-csx10-0
>>>>>>> C bindings: yes
>>>>>>> C++ bindings: yes
>>>>>>> Fortran77 bindings: yes (all)
>>>>>>> Fortran90 bindings: yes
>>>>>>> Fortran90 bindings size: trivial
>>>>>>> C compiler: cc
>>>>>>> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>>>>>>> C++ compiler: CC
>>>>>>> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>>>>>>> Fortran77 compiler: f77
>>>>>>> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
>>>>>>> Fortran90 compiler: f95
>>>>>>> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>>>>>>> C profiling: yes
>>>>>>>
>>>>>>>
>> >>>>> C++ profiling: yes
>>
>>
>>>>>>> Fortran77 profiling: yes
>>>>>>> Fortran90 profiling: yes
>>>>>>> C++ exceptions: yes
>>>>>>> Thread support: no
>>>>>>> Internal debug support: no
>>>>>>>
>>>>>>>
>> >>>>> MPI parameter check: runtime
>>
>>
>>>>>>> Memory profiling support: no
>>>>>>> Memory debugging support: no
>>>>>>> libltdl support: yes
>>>>>>> Heterogeneous support: yes
>>>>>>> mpirun default --prefix: yes
>>>>>>> MCA backtrace: printstack (MCA v1.0, API v1.0,
>>>>>>> Component v1.2.1)
>>>>>>> MCA paffinity: solaris (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA maffinity: first_use (MCA v1.0, API v1.0,
>>>>>>> Component v1.2.1)
>>>>>>> MCA timer: solaris (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA allocator: basic (MCA v1.0, API v1.0, Component
>>>>>>> v1.0)
>>>>>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component
>>>>>>> v1.0)
>>>>>>> MCA coll: basic (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA coll: self (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
>>>>>>> MCA coll: tuned (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA io: romio (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
>>>>>>> MCA mpool: udapl (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
>>>>>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
>>>>>>> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
>>>>>>> MCA rcache: vma (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA btl: self (MCA v1.0, API v1.0.1, Component
>>>>>>> v1.2.1)
>>>>>>> MCA btl: sm (MCA v1.0, API v1.0.1, Component
>>>>>>> v1.2.1)
>>>>>>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component
>>>>>>> v1.0)
>>>>>>> MCA btl: udapl (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA topo: unity (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component
>>>>>>> v1.2.1)
>>>>>>> MCA errmgr: orted (MCA v1.0, API v1.3, Component
>>>>>>> v1.2.1)
>>>>>>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component
>>>>>>> v1.2.1)
>>>>>>> MCA gpr: null (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA gpr: replica (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA iof: proxy (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA iof: svc (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA ns: proxy (MCA v1.0, API v2.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA ns: replica (MCA v1.0, API v2.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>>>>>>> MCA ras: dash_host (MCA v1.0, API v1.3,
>>>>>>> Component v1.2.1)
>>>>>>> MCA ras: gridengine (MCA v1.0, API v1.3,
>>>>>>> Component v1.2.1)
>>>>>>> MCA ras: localhost (MCA v1.0, API v1.3,
>>>>>>> Component v1.2.1)
>>>>>>> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
>>>>>>> MCA rds: hostfile (MCA v1.0, API v1.3,
>>>>>>> Component v1.2.1)
>>>>>>> MCA rds: proxy (MCA v1.0, API v1.3, Component
>>>>>>> v1.2.1)
>>>>>>> MCA rds: resfile (MCA v1.0, API v1.3, Component
>>>>>>> v1.2.1)
>>>>>>> MCA rmaps: round_robin (MCA v1.0, API v1.3,
>>>>>>> Component v1.2.1)
>>>>>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA rmgr: urm (MCA v1.0, API v2.0, Component
>>>>>>>
>>>>>>>
>> >>>>> v1.2.1)
>>
>>
>>>>>>> MCA rml: oob (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA pls: gridengine (MCA v1.0, API v1.3,
>>>>>>> Component v1.2.1)
>>>>>>> MCA pls: proxy (MCA v1.0, API v1.3, Component
>>>>>>>
>>>>>>>
>> >>>>> v1.2.1)
>>
>>
>>>>>>> MCA pls: rsh (MCA v1.0, API v1.3, Component
>>>>>>> v1.2.1)
>>>>>>> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
>>>>>>> MCA sds: env (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA sds: pipe (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA sds: seed (MCA v1.0, API v1.0, Component
>>>>>>> v1.2.1)
>>>>>>> MCA sds: singleton (MCA v1.0, API v1.0,
>>>>>>> Component v1.2.1)
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>