Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] OpenMPI and SGE integration made more stable
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-07-27 00:11:01


Been chatting off-list with the SGE folks - can you tell us what version of SGE you are using?

On Jul 26, 2012, at 9:02 AM, Christoph van Wüllen wrote:

> It is a long-standing problem that due to a bug in Sun GridEngine
> (setting the stack size limit equal to the address space limit)
> using qrsh from within OpenMPI fails if a large memory is requested
> but the stack size not explicitly set to a reasonably small value.
>
> The best solution were if SGE just would not touch the stack
> size limit and leave it at INFINITY.
>
> However I have tested that just reducing the stack size limit in
> file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child() before
> execv'ing qrsh circumvents the problem, so just after exec_patch is set
> by strdup(...) I inserted the lines
>
> {
> struct rlimit rlim;
> int l;
>
> l=strlen(exec_path);
> if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) {
> getrlimit(RLIMIT_STACK, &rlim);
> if (rlim.rlim_max > 10000000L) rlim.rlim_max=10000000L;
> if (rlim.rlim_cur > 10000000L) rlim.rlim_cur=10000000L;
> setrlimit(RLIMIT_STACK, &rlim);
> }
> }
>
>
> It looks quick-and-dirty and it certainly is, but it solves a severe
> problem many users have with OpenMPI and SGE. Feel free to use this
> information as you like. Note that MPI worker jobs eventually
> spawned off on "distant" nodes do not suffer from the reduced stack
> size limit, it is only the qrsh command.
>
> Is this (still) of interest?
>
> +---------------------------------+----------------------------------+
> | Prof. Christoph van Wüllen | Tele-Phone (+49) (0)631 205 2749 |
> | TU Kaiserslautern, FB Chemie | Tele-Fax (+49) (0)631 205 2750 |
> | Erwin-Schrödinger-Str. | |
> | D-67663 Kaiserslautern, Germany | vanWullen_at_[hidden] |
> | |
> | HomePage: http://www.chemie.uni-kl.de/vanwullen |
> +---------------------------------+----------------------------------+
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel