Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI_MCA_opal_set_max_sys_limits
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-08-31 07:48:20

Perhaps it would help if you had clearly stated your concern. From this description, I gather your concern is -not- that remote processes don't see the setting, but that the remote -orteds- don't see it.

Yes, I'm aware of that issue for rsh-based launches. It stems from rsh not allowing one to extend the environment. If you place the param on the cmd line, then it gets propagated because we collect and extend the cmd line params. If you place it in the environment, then we don't - because (as we have repeatedly explained to people) we cannot pass all relevant envars on the cmd line due to length restrictions. We don't have this issue with cmd line params because (the thinking goes) it already fit on the cmd line.

So for rsh-like launches, there is an unavoidable discrepancy. It's one reason why we have both system-level and personal-level MCA param files.


On Aug 31, 2011, at 12:22 AM, Eugene Loh wrote:

> On 8/30/2011 7:34 PM, Ralph Castain wrote:
>> On Aug 29, 2011, at 11:18 PM, Eugene Loh wrote:
>>> Maybe someone can help me from having to think too hard.
>>> Let's say I want to max my system limits. I can say this:
>>> % mpirun --mca opal_set_max_sys_limits 1 ...
>>> Cool.
>>> Meanwhile, if I do this:
>>> % setenv OMPI_MCA_opal_set_max_sys_limits 1
>>> % mpirun ...
>>> remote processes don't see the setting. (Local processes and ompi_info are fine.)
>> I looked at the 1.5 code, and mpirun is reaping all OMPI_ params from the environ and adding them to the app. So it should be getting set.
>> I then ran "mpirun -n 1 printenv" on a slurm machine, and verified that indeed that param was in the environment. Ditto when I told it to use the rsh launcher.
>>> Bug? Naively, this looks "wrong." At least disturbing, in any case.
>>> This is with v1.5.
> Okay, so one answer is implicit in your reply: you are expecting the same result I am. So, if the behavior is not as I expect but as I describe, it's a bug candidate. (As opposed to, "The problem you're describing is how it's supposed to work; it's no problem at all.")
> Now, regarding "mpirun -n 1 printenv", I agree that the environment variable is getting set. Even on a remote node. That suggests that things are fine, but it turns out they are not. The problem is -- and I'm afraid I don't understand the details -- it's set "too late." I imagine a time line like this:
> A) orted starts
> B) orted calls opal_util_init_sys_limits()
> C) daemonize a child process
> D) child process execs target process
> E) target process starts up
> Looking at the environment, I don't see the variable set in B, which is the only place the variable does any good. Like you, I do see it in E, which is interesting but doesn't help the user.
> Your experiment was reasonable, but the problem is odd. I suggest the following to see the problem. Set the variable in your environment. Then use mpirun to launch a remote process. Then:
> 1) In the remote orted, inside opal_util_init_sys_limits(), check for the variable in your environment.
> And/or:
> 2) Make the remotely launched process something like this:
> #!/bin/csh
> limit descriptors
> and see if the descriptor limit got bumped up from what it otherwise should be.
> In contrast, if you set the MCA parameter on your mpirun command line, the environment variable *does* get set, even in the environment of the orted when it calls opal_util_init_sys_limits().
> I can poke at this more tomorrow, but I suspect with one "aha!" you'll figure it out a lot faster than I can. :^(
> _______________________________________________
> devel mailing list
> devel_at_[hidden]