Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Pak Lui (Pak.Lui_at_[hidden])
Date: 2007-06-11 09:47:40


Markus Daene wrote:
> then it becomes ever worse:
> openmpi nicely report the (max./act.) used virtual memory to the grid
> engine as sum of all processes.
> This value is the compared with the one the user has specified with the
> h_vmem option, but the
> gridengine takes this value per process for the allocation of the job
> (works) and does not multiply
> this with the number of processes. Maybe one should report this to the
> gridenging mailing list, but it
> could be related as well for the openmpi interface.

Hi Markus,

 From the SGE 6.1 man page, it shows the hard virtual memory limit
(h_vmem) is for the virtual memory consumed by all processes in a job. I
don't think SGE does the fine-grained resource limit for each process
anyway. Maybe you can verify this with the grid engine mailing list to
just to confirm.

N1GE 6 Last change: 2007/02/14 13:34:15 12

N1 Grid Engine File Formats QUEUE_CONF(5)

      The resource limit parameters s_vmem and h_vmem are imple-
      mented by N1 Grid Engine as a job limit. They impose a limit
      on the amount of combined virtual memory consumed by all the
      processes in the job. If h_vmem is exceeded by a job running
      in the queue, it is aborted via a SIGKILL signal (see
      kill(1)). If s_vmem is exceeded, the job is sent a SIGXCPU
      signal which can be caught by the job. If you wish to allow
      a job to be "warned" so it can exit gracefully before it is
      killed then you should set the s_vmem limit to a lower value
      than h_vmem. For parallel processes, the limit is applied
      per slot which means that the limit is multiplied by the
      number of slots being used by the job before being applied.

...
      h_cpu The per-job CPU time limit in seconds.

      h_data The per-job maximum memory limit in bytes.

      h_vmem The same as h_data (if both are set the minimum is
                used).

>
> The last thing I noticed:
> It seems that if the v_mem option for gridengine jobs is specified like
> '2.0G' my test job was
> immedialtely killed; but when I specify '2000M' (which is obviously
> less) it work. The gridengine
> puts the job allways on the correct node as requested, but I think there
> is might be a problem in
> the openmpi interface.

You should email the grid engine alias. This sounds like a SGE bug to me.

>
>
> It would be nice if someone could give some hints how to avoid the
> quadratic scaling or maybe to think
> if this is really neccessary in openmpi.
>
>
> Thanks.
> Markus Daene
>
>
>
>
> my compiling options:
> ./configure --prefix=/not_important --enable-static
> --with-f90-size=medium --with-f90-max-array-dim=7 --with-mpi-para
> m-check=always --enable-cxx-exceptions --with-mvapi
> --enable-mca-no-build=btl-tcp
>
> ompi_info output:
> Open MPI: 1.2.2
> Open MPI SVN revision: r14613
> Open RTE: 1.2.2
> Open RTE SVN revision: r14613
> OPAL: 1.2.2
> OPAL SVN revision: r14613
> Prefix: /usrurz/openmpi/1.2.2/pathscale_3.0
> Configured architecture: x86_64-unknown-linux-gnu
> Configured by: root
> Configured on: Mon Jun 4 16:04:38 CEST 2007
> Configure host: GE1N01
> Built by: root
> Built on: Mon Jun 4 16:09:37 CEST 2007
> Built host: GE1N01
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: pathcc
> C compiler absolute: /usrurz/pathscale/bin/pathcc
> C++ compiler: pathCC
> C++ compiler absolute: /usrurz/pathscale/bin/pathCC
> Fortran77 compiler: pathf90
> Fortran77 compiler abs: /usrurz/pathscale/bin/pathf90
> Fortran90 compiler: pathf90
> Fortran90 compiler abs: /usrurz/pathscale/bin/pathf90
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: yes
> Thread support: posix (mpi: no, progress: no)
> Internal debug support: no
> MPI parameter check: always
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: yes
> mpirun default --prefix: no
> MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.2)
> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.2)
> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.2)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.2)
> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.2)
> MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.2)
> MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.2)
> MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.2)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.2)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.2)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.2)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.2)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.2)
> MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.2)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.2)
> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.2)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.2)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.2)
> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.2)
> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.2)
> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.2)
> MCA btl: mvapi (MCA v1.0, API v1.0.1, Component v1.2.2)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.2)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.2)
> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.2)
> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.2)
> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.2)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.2)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.2)
> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.2)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.2)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.2)
> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.2)
> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.2)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.2)
> MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.2)
> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.2)
> MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.2)
> MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.2)
> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.2)
> MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.2)
> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.2)
> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.2)
> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.2)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.2)
> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.2)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.2)
> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.2)
> MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.2)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.2)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.2)
> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.2)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.2)
> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.2)
>
> ----------------------------------------------------------
> Markus Daene
> Martin Luther University Halle-Wittenberg
> Naturwissenschaftliche Fakultaet II
> Institute of Physics
> Von Seckendorff-Platz 1 (room 1.28)
> 06120 Halle
> Germany
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
- Pak Lui
pak.lui_at_[hidden]