Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] run openMPI jobs with SGE,
From: Reuti (reuti_at_[hidden])
Date: 2010-04-09 13:34:45


Am 09.04.2010 um 18:57 schrieb Cristobal Navarro:

> sorry the command was missing a number
>
> as you said it should be
>
> qrsh -verbose -pe pempi 6 mpirun -np 6 hostname
> waiting for interactive job to be scheduled ...
>
> Your "qrsh" request could not be scheduled, try again later.
> ---
> this is my parallel enviroment
> qconf -sp pempi
> pe_name pempi
> slots 210
> user_lists NONE
> xuser_lists NONE
> start_proc_args /usr/bin/true
> stop_proc_args /usr/bin/true
> allocation_rule $pe_slots

$pe_slots means that all slots must come from one and the same machine (e.g. for smp jobs). You can try $round_robin.

-- Reuti

> control_slaves TRUE
> job_is_first_task FALSE
> urgency_slots min
> accounting_summary TRUE
>
> this is the queue
> qconf -sq cola.q
> qname cola.q
> hostlist @allhosts
> seq_no 0
> load_thresholds np_load_avg=1.75
> suspend_thresholds NONE
> nsuspend 1
> suspend_interval 00:05:00
> priority 0
> min_cpu_interval 00:05:00
> processors UNDEFINED
> qtype BATCH INTERACTIVE
> ckpt_list NONE
> pe_list make pempi
> rerun FALSE
> slots 2
> tmpdir /tmp
> shell /bin/csh
>
> i noticed that if i put 2 slots (since the queue has 2 slots) on the -pe pempi N argument and also the full path to mpirun as you guys pointed, it works!!!
> cristobal_at_neoideo:~$ qrsh -verbose -pe pempi 2 /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> Your job 125 ("mpirun") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 125 has been successfully scheduled.
> Establishing builtin session to host ijorge.local ...
> ijorge.local
> ijorge.local
> ijorge.local
> ijorge.local
> ijorge.local
> ijorge.local
> cristobal_at_neoideo:~$ qrsh -verbose -pe pempi 2 /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> Your job 126 ("mpirun") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 126 has been successfully scheduled.
> Establishing builtin session to host neoideo ...
> neoideo
> neoideo
> neoideo
> neoideo
> neoideo
> neoideo
> cristobal_at_neoideo:~$
>
> i just wonder why i didnt get mixed hostnames? like
> neoideo
> neoideo
> ijorge.local
> ijorge.local
> neoideo
> ijorge.local
>
> ??
>
> thanks for the help already!!!
>
> Cristobal
>
>
>
>
> On Fri, Apr 9, 2010 at 8:58 AM, Huynh Thuc Cuoc <htcuoc_at_[hidden]> wrote:
> Dear friend,
> 1.
> I prefer to use sge qsub cmd, for examples:
>
> [huong_at_ioitg2 MyPhylo]$ qsub -pe orte 3 myphylo.qsub
> Your job 35 ("myphylo.qsub") has been submitted
> [huong_at_ioitg2 MyPhylo]$ qstat
> job-ID prior name user state submit/start at queue slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
> 35 0.55500 myphylo.qs huong r 04/09/2010 19:28:59 all.q_at_[hidden] 3
> [huong_at_ioitg2 MyPhylo]$ qstat
> [huong_at_ioitg2 MyPhylo]$
>
> This job is running on node2 of my cluster.
> My softs as following:
> headnode: 4 CPUs. $GRAM, CentOS 5.4 + sge 6.2u4 (qmaster and also execd host) + openmpi 1.4.1
> nodes 4CPUs, 1GRAM, CentOS 5.4 + sgeexecd + openmpi1.4.1
> PE=orte and set to 4 slots.
> The app myphylo.qsub has the long cmd in the shell:
> /opt/openmpi/bin/mpirun -np 10 $HOME/MyPhylo/bin/par-phylo-builder --data . . . .
> Try to set PE as orte, use default PE = make instead.
>
> 2. I test your cmd on my sytem as:
> a.
> [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe make mpirun -np 6 hostname
> error: Numerical value invalid!
> The initial portion of string "mpirun" contains no decimal number
> [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 mpirun -np 6 hostname
> Your job 36 ("mpirun") has been submitted
>
> waiting for interactive job to be scheduled ...
> Your interactive job 36 has been successfully scheduled.
> Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> bash: mpirun: command not found
> [huong_at_ioitg2 MyPhylo]$
>
> ERROR ! So I try:
> [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 /opt/openmpi/bin/mpirun -np 6 hostname
> Your job 38 ("mpirun") has been submitted
>
> waiting for interactive job to be scheduled ...
> Your interactive job 38 has been successfully scheduled.
> Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> [huong_at_ioitg2 MyPhylo]$
>
> This OK.
> What is: the PATH points to where mpirun is located.
>
> TRY.
>
> Good chance
> HT Cuoc
>
>
> On Fri, Apr 9, 2010 at 11:02 AM, Cristobal Navarro <axischire_at_[hidden]> wrote:
> Hello,
>
> after some days of work and testing, i managed to install SGE on two machines, also installed openMPI 1.4.1 for each one.
>
> SGE is working, i can submit jobs and it schedules the jobs to the available cores total of 6,
>
> my problem is that im trying to run an openMPI job and i cant.
>
> this is an example of what i am trying.
>
>
> $qrsh -verbose -pe pempi mpirun -np 6 hostname
> Your job 105 ("mpirun") has been submitted
> waiting for interactive job to be scheduled ...
>
> Your "qrsh" request could not be scheduled, try again later.
>
> im not sure what this can be,
> in the ompi_info i have gridengine support.
>
> where do you recommend to look ??
> thanks in advance
>
> Cristobal
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users