Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
From: Reuti (reuti_at_[hidden])
Date: 2012-02-03 18:55:46


Am 04.02.2012 um 00:15 schrieb Tom Bryan:

A more detailed answer later, as it's late here. But one short note:

-pe orte 5 => give me exactly 5 slots

-pe orte 5-5 => the same

-pe orte 5- => give me at least 5 slots, up to the maximum you can get right now in the cluster

The output in `qstat -g t` master/slave only tells you what is granted, not was it necessarly used by you right now. It's up to the application to use the granted slots.

==

Requesting exactly 5, will show you either "one master and four slaves" or "one master and five slaves". This depends on the setting of "job_is_first_task" in the definition of the PE.

The rationale behind this is, that it will adjust the number of `qrsh -inherit ` calls (just imagine single core machines to understand the idea behind it) which are allowed. In a plain MPI application usually "job_is_first_task" is set to yes, as also the started executable on the machine where the `mpiexec` is issued in the jobscript is doing some work (usually the rank 0). This would result of 4 `qrsh -inherit` being allowed and have a total of 5.

If your rank 0 is for any reason only collecting results and not doing any work (i.e. master/slave application like in PVM), you would like to say "job_is_first_task no". This has the effect, that one additional `qrsh -inherit` is allowed - in detail: a local one plus 4 to other nodes to start 5 slaves.

Nowadays, where you have many cores per node and even use only one `qrsh -inherit` per slave machine and then forks or threads for the additional processes, this setting is less meaningful and would need some new options in the PE:

https://arc.liv.ac.uk/trac/SGE/ticket/197

-- Reuti

> 1. I'm still surprised that the SGE behavior is so different when I
> configure my SGE queue differently. See test "a" in the .tgz. When I just
> run mpitest in mpi.sh and ask for exactly 5 slots (-pe orte 5-5), it works
> if the queue is configured to use a single host. I see 1 MASTER and 4
> SLAVES in qstat -g t, and I get the correct output. If the queue is set to
> use multiple hosts, the jobs hang in spawn/init, and I get errors
> [grid-03.cisco.com][[19159,2],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint
> _complete_connect] connect() to 192.168.122.1 failed: Connection refused
> (111)
> [grid-10.cisco.com:05327] [[19159,0],3] routed:binomial: Connection to
> lifeline [[19159,0],0] lost
> [grid-16.cisco.com:25196] [[19159,0],1] routed:binomial: Connection to
> lifeline [[19159,0],0] lost
> [grid-11.cisco.com:63890] [[19159,0],2] routed:binomial: Connection to
> lifeline [[19159,0],0] lost
> So, I'll just assume that mpiexec does some magic that is needed in the
> multi-machine scenario but not in the single machine scenario.
>
> 2. I guess I'm not sure how SGE is supposed to behave. Experiment "a" and
> "b" were identical except that I changed -pe orte 5-5 to -pe orte 5-. The
> single case works like before, and the multiple exec host case fails as
> before. The difference is that qstat -g t shows additional SLAVEs that
> don't seem to correspond to any jobs on the exec hosts. Are these SLAVEs
> just slots that are reserved for my job but that I'm not using? If my job
> will only use 5 slots, then I should set the SGE qsub job to ask for exactly
> 5 with "-pe orte 5-5", right?
>
> 3. Experiment "d" was similar to "b", but I use mpi.sh uses "mpiexec -np 1
> mpitest" instead of running mpitest directly. Now both the single machine
> queue and multiple machine queue work. So, mpiexec seems to make my
> multi-machine configuration happier. In this case, I'm still using "-pe
> orte 5-", and I'm still seeing the extra SLAVE slots granted in qstat -g t.
>
> 4. Based on "d", I thought that I could follow the approach in "a". That
> is, for experiment "e", I used mpiexec -np 1, but I also used -pe orte 5-5.
> I thought that this would make the multi-machine queue reserve only the 5
> slots that I needed. The single machine queue works correctly, but now the
> multi-machine case hangs with no errors. The output from qstat and pstree
> are what I'd expect, but it seems to hang in Span_multiple and Init_thread.
> I really expected this to work.
>
> I'm really confused by experiment "e" with multiple machines in the queue.
> Based on "a" and "d", I thought that a combination of mpiexec -np 1 would
> permit the multi-machine scheduling to work with MPI while the "-pe orte
> 5-5" would limit the slots to exactly the number that it needed to run.
>
> ---Tom
>
> <mpiExperiments.tgz>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users