Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] orte_odls_base_default_launch_local()
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2011-07-12 20:44:17


Thanks for the quick and helpful response.

I'm willing to make the changes if only I were more confident I
understood the issues.

What you call the first usage of num_procs_alive seems to count "local
procs alive or about to be launched". It tests on "child->alive ||
opal_dss.compare(&job, &(child->name->jobid), ORTE_JOBID))". This is
only in the !oversubscribed case.

What you call "the later usage" ("0 < opal_sys_limits.num_procs" or
"check the system limits") seems to want only the number of alive procs
(not including any that are about to be launched). So, as I guess you
were driving at, a fresh computation of num_procs_alive seems to be
needed at this point, both in v1.5 and in the trunk.

There is actually a third usage of num_procs_alive ("available file
descriptors" or "0 < opal_sys_limits.num_files"). I think there are two
problems here. One is that the number of procs we're about to launch is
not accounted for. Secondly, and perhaps most importantly, if we "wait
a little time" and recompute num_procs_alive, we loop over item/child.
This is bad since we are already inside an item/child loop! (This is
actually the origin of my foray into this mess...
MPI_Comm_spawn_multiple gets screwed up by this reuse of variable names.)

I'm attaching an original (trunk) file, a proposed version, and the diff
file. Let me know if it looks sane. If so, I'll try it out and put it
back and then file a CMR for 1.5.

Some more below...

On 07/12/11 18:19, Ralph Castain wrote:

> Well, yes and no.
>
> Yes - the value definitely needs to be computed in all cases since it is used later on.
>
> No - the way it is computed is only correct for that one usage. The later usage (in the block that starts with 0< opal_sys_limits.num_procs)) needs a completely different value.
>
> The current computation only comes up with the number of procs for the specific job being launched. The latter code requires the total number of local procs that are alive - a rather different value.

The first usage, to compare with num_processors, includes not only procs
for the job being launched but also already-alive procs.

The second usage, to be compared to opal_sys_limits.num_procs, should
only have alive procs since app->num_procs will be added later. (Plus,
we try to keep a running total of this number of alive procs.)

The third usage, to be compared to opal_sys_limits.num_files, should
also only have alive procs for the same reasons.

> So v1.5 is incorrect in either case. Would you like me to create a patch to correct it, or would you like to do it (covering both cases)?

Trunk also appears to be incorrect.

> On Jul 12, 2011, at 3:35 PM, Eugene Loh wrote:
>> The function orte_odls_base_default_launch_local() has a variable num_procs_alive that is basically initialized like this:
>>
>> if ( oversubscribed ) {
>> ...
>> } else {
>> num_procs_alive = ...;
>> }
>>
>> Specifically, if the "oversubscribed" test passes, the variable is not initialized.
>>
>> (Strictly speaking, this is true only in v1.5. In the trunk, the variable is set to 0 when it is declared, but I'm not sure that's very helpful.)
>>
>> I'm inclined to move the num_procs_alive computation ahead of the "if" block so that this computation is always performed.