This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Am 14.03.2012 um 17:44 schrieb Ralph Castain:
> Hi Reuti
> I appreciate your help on this thread - I confess I'm puzzled by it. As you know, OMPI doesn't use SGE to launch the individual processes, nor does SGE even know they exist. All SGE is used for is to launch the OMPI daemons (orteds). This is done as a single qrsh call, so won't all the daemons wind up being executed against the same queue regardless of how many queues exist in the system?
Yes, per machine they will then start in one queue (the one the first and only `qrsh -inherit ...` will be assigned to). But between machines, they can get different queues. I would also assume that this is not relevant to Open MPI. You could say it's a cosmectic flaw, but it's worth to be noted as some applications expect the same $TMPDIR to be present on all machines with exactly the same name, and this can't be guranteed in case different queues were used for a job.
> Given that the daemons then fork/exec the MPI processes (outside of qrsh), I would think they would inherit that nice setting as well, and so all the procs will be running at the same nice level too.
> As for TMPDIR, we don't forward that unless specifically directed to do so, which I didn't see on their cmd line.
The SGE integration of Open MPI will forward all variables from the master task to all nodes by the supplied -V option in the Open MPI source. But for TMPDIR it won't do any harm, as SGE will overide this with the real $TMPDIR according to the selected queue on each particular slave machine again.
If now this is again overriden by the application by any distribution of a variable to the slaves, then it can fail as the expected $TMPDIR isn't there. As said: maybe it's unrelated to the issue.
I just tested with two different queues on two machines and a small mpihello and it is working as expected.
Joshua: the Centos6 is the same on all nodes and the you recompiled the application with the actual version of the library? By "threads" you refer to "processes"?
> On Mar 14, 2012, at 2:33 AM, Reuti wrote:
>> Am 14.03.2012 um 04:02 schrieb Joshua Baker-LePain:
>>> On Tue, 13 Mar 2012 at 5:31pm, Ralph Castain wrote
>>>> FWIW: I have a Centos6 system myself, and I have no problems running OMPI on it (1.4 or 1.5). I can try building it the same way you do and see what happens.
>>> I can run as many threads as I like on a single system with no problems, even if those threads are running at different nice levels.
>> How do they get different nice levels - you renice them? I would assume that all start at the same of the parent. In your test program you posted there are no threads.
>>> The problem seems to arise when I'm both a) running across multiple machines and b) running threads at differing nice levels (which often happens as a result of our queueing setup).
>> This sounds like you are getting slots from different queues assigned to one and the same job. My experience: don't do it, unless you neeed it. The problem is, that SGE can't decide in its `qrsh -inherit ...` call, which queue is the correct one for the particular call. As a result all calls to a slave machine can end up in one and the same queue. Although this is not correct, it won't oversubscribe the node, as usually the overall slot amount is limited already and it's more a matter of names SGE sets for the environment of the job:
>> As a result, the SGE set $TMPDIR can be different between the master of the parallel job and the slave as the name of the queue is part of $TMPDIR. When a wrong $TMPDIR is set on a node (by Open MPI's forwarding?), strange things can happen depending on the application.
>> Do you face the same if you stay in one and the same queue across the machines? If you want to limit the number of available PEs in your setup for the user, you could request a PE by a wildcard and once a PE is selected SGE will stay in this PE. Attaching each PE to only one queue allows this way to avoid the mixture of slots from different queues (orte1 PE => all.q, orte2 PE => extra.q and you request orte*).
>> -- Reuti
>>> I can't guarantee that the problem *never* happens when I run across multiple machines with all the threads un-niced, but I haven't been able to reproduce that at will like I can for the other case.
>>> Joshua Baker-LePain
>>> QB3 Shared Cluster Sysadmin
>>> users mailing list
>> users mailing list
> users mailing list