Subject: [OMPI devel] OpenMPI and SGE integration made more stable
From: Christoph van Wüllen (vanwullen_at_[hidden])
Date: 2012-07-26 12:02:22

It is a long-standing problem that due to a bug in Sun GridEngine
(setting the stack size limit equal to the address space limit)
using qrsh from within OpenMPI fails if a large memory is requested
but the stack size not explicitly set to a reasonably small value.

The best solution were if SGE just would not touch the stack
size limit and leave it at INFINITY.

However I have tested that just reducing the stack size limit in
file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child() before
execv'ing qrsh circumvents the problem, so just after exec_patch is set
by strdup(...) I inserted the lines

   struct rlimit rlim;
   int l;

   if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) {
     getrlimit(RLIMIT_STACK, &rlim);
     if (rlim.rlim_max > 10000000L) rlim.rlim_max=10000000L;
     if (rlim.rlim_cur > 10000000L) rlim.rlim_cur=10000000L;
     setrlimit(RLIMIT_STACK, &rlim);

It looks quick-and-dirty and it certainly is, but it solves a severe
problem many users have with OpenMPI and SGE. Feel free to use this
information as you like. Note that MPI worker jobs eventually
spawned off on "distant" nodes do not suffer from the reduced stack
size limit, it is only the qrsh command.

Is this (still) of interest?

