Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OpenMPI and SGE integration made more stable
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-07-27 00:11:01


Been chatting off-list with the SGE folks - can you tell us what version of SGE you are using?

On Jul 26, 2012, at 9:02 AM, Christoph van Wüllen wrote:

> It is a long-standing problem that due to a bug in Sun GridEngine
> (setting the stack size limit equal to the address space limit)
> using qrsh from within OpenMPI fails if a large memory is requested
> but the stack size not explicitly set to a reasonably small value.
>
> The best solution were if SGE just would not touch the stack
> size limit and leave it at INFINITY.
>
> However I have tested that just reducing the stack size limit in
> file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child() before
> execv'ing qrsh circumvents the problem, so just after exec_patch is set
> by strdup(...) I inserted the lines
>
> {
> struct rlimit rlim;
> int l;
>
> l=strlen(exec_path);
> if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) {
> getrlimit(RLIMIT_STACK, &rlim);
> if (rlim.rlim_max > 10000000L) rlim.rlim_max=10000000L;
> if (rlim.rlim_cur > 10000000L) rlim.rlim_cur=10000000L;
> setrlimit(RLIMIT_STACK, &rlim);
> }
> }
>
>
> It looks quick-and-dirty and it certainly is, but it solves a severe
> problem many users have with OpenMPI and SGE. Feel free to use this
> information as you like. Note that MPI worker jobs eventually
> spawned off on "distant" nodes do not suffer from the reduced stack
> size limit, it is only the qrsh command.
>
> Is this (still) of interest?
>
> +---------------------------------+----------------------------------+
> | Prof. Christoph van Wüllen | Tele-Phone (+49) (0)631 205 2749 |
> | TU Kaiserslautern, FB Chemie | Tele-Fax (+49) (0)631 205 2750 |
> | Erwin-Schrödinger-Str. | |
> | D-67663 Kaiserslautern, Germany | vanWullen_at_[hidden] |
> | |
> | HomePage: http://www.chemie.uni-kl.de/vanwullen |
> +---------------------------------+----------------------------------+
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel