Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] newbie: Submitting Open MPI jobs to SGE ( `qsh, -pe orte 4` fails)
From: Reuti (reuti_at_[hidden])
Date: 2013-02-11 10:50:54


Am 11.02.2013 um 12:26 schrieb Pierre Lindenbaum:

> <snip>
> and I've changed `shell_start_mode posix_compliant` to `unix_behavior ` using `qconf -mconf`. (However, shell_start_mode is still listed as posix_compliant )

AFAIK this is deprecated on the configuration level, as it moved to the queue definition `qconf -mq all.q`.

> Now, qsh -pe orte 4 works
>
> qsh -pe orte 4

A plain `qsh` is working for you? This is an old startup method due to the insecure X11 startup it shouldn't be used any longer IMO.

> Your job 84581 ("INTERACTIVE") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 84581 has been successfully scheduled.
>
>
> (should I run that command before running any a new mpirun command ?)
>
> when invoking:
>
> qsub -cwd -pe orte 7 with-a-shell.sh
> or
> qrsh -cwd -pe orte 100 /commun/data/packages/openmpi/bin/mpirun /path/to/a.out arg1 arg2 arg3 ....
>
> that works too ! Thank you ! :-)
>
>
> queuename qtype resv/used/tot. load_avg
> arch states
> ---------------------------------------------------------------------------------
> all.q_at_node01 BIP 0/15/64 2.76 lx24-amd64
> 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 15
> ---------------------------------------------------------------------------------
> all.q_at_node02 BIP 0/14/64 3.89 lx24-amd64
> 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14
> ---------------------------------------------------------------------------------
> all.q_at_node03 BIP 0/14/64 3.23 lx24-amd64
> 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14
> ---------------------------------------------------------------------------------
> all.q_at_node04 BIP 0/14/64 3.68 lx24-amd64
> 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14
> ---------------------------------------------------------------------------------
> all.q_at_node05 BIP 0/15/64 2.91 lx24-amd64
> 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 15
> ---------------------------------------------------------------------------------
> all.q_at_node06 BIP 0/14/64 3.91 lx24-amd64
> 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14
> ---------------------------------------------------------------------------------
> all.q_at_node07 BIP 0/14/64 3.79 lx24-amd64
> 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14
>
>
>
> OK, my first openmpi program works. But as far as I can see: it is faster when invoked on the master node (~3.22min) than when invoked by means of SGE (~7H45):

It's 7:45 to 3:32 - both in minutes:seconds, or?

All machines are the same regarding speed and core count? BTW: running interactively in SGE might not set environment variables in case you use `qrsh` without a command or `qlogin` and some default hostfile will be used instead (unless you provide one). Below with the supplied command it should be fine.

-- Reuti

> time /commun/data/packages/openmpi/bin/mpirun -np 7 /path/to/a.out arg1 arg2 arg3 ....
> 670.985u 64.929s 3:32.36 346.5% 0+0k 16322112+6560io 32pf+0w
>
> time qrsh -cwd -pe orte 7 /commun/data/packages/openmpi/bin/mpirun
> /path/to/a.out arg1 arg2 arg3 ....
> 0.023u 0.036s 7:45.05 0.0% 0+0k 1496+0io 1pf+0w
>
>
>
> I'm going to investigate this... :-)
>
> Thank you again
>
> Pierre
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users