Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] run openMPI jobs with SGE,
From: Reuti (reuti_at_[hidden])
Date: 2010-04-12 05:06:07


Hi,

Am 09.04.2010 um 23:48 schrieb Cristobal Navarro:

> Thanks,
> now i get mixed results and everything seems to be working ok with mixed mpi xecution
>
> is it normal that after receiving the results, the hosts remain busy like 15 seconds ??
> example

yes. This is the time SGE needs for housekeeping, ist can even take some minutes (especially if you kill a parallel job).

-- Reuti

> master:common master$ qrsh -verbose -pe orte 10 /opt/openmpi-1.4.1/bin/mpirun -np 10 hostname
> Your job 65 ("mpirun") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 65 has been successfully scheduled.
> Establishing builtin session to host worker00.local ...
> worker00.local
> worker00.local
> worker00.local
> worker00.local
> worker00.local
> master.local
> master.local
> master.local
> master.local
> master.local
> #after some seconds, i query the hosts status and slots are still used
> master:common master$ qstat -f
> queuename qtype resv/used/tot. load_avg arch states
> ---------------------------------------------------------------------------------
> all.q_at_master.local BIP 0/5/16 0.02 darwin-x86
> 65 0.55500 mpirun master r 04/09/2010 17:44:36 5
> ---------------------------------------------------------------------------------
> all.q_at_worker00.local BIP 0/5/16 0.01 darwin-x86
> 65 0.55500 mpirun master r 04/09/2010 17:44:36 5
> master:common master$
>
> but after waiting more time, they get free again
> master:common master$ qstat -f
> queuename qtype resv/used/tot. load_avg arch states
> ---------------------------------------------------------------------------------
> all.q_at_master.local BIP 0/0/16 0.01 darwin-x86
> ---------------------------------------------------------------------------------
> all.q_at_worker00.local BIP 0/0/16 0.01 darwin-x86
>
> anyways these are just details, thanks to your help the important aspects are working.
> Cristobal
>
>
>
>
> On Fri, Apr 9, 2010 at 1:34 PM, Reuti <reuti_at_[hidden]> wrote:
> Am 09.04.2010 um 18:57 schrieb Cristobal Navarro:
>
> > sorry the command was missing a number
> >
> > as you said it should be
> >
> > qrsh -verbose -pe pempi 6 mpirun -np 6 hostname
> > waiting for interactive job to be scheduled ...
> >
> > Your "qrsh" request could not be scheduled, try again later.
> > ---
> > this is my parallel enviroment
> > qconf -sp pempi
> > pe_name pempi
> > slots 210
> > user_lists NONE
> > xuser_lists NONE
> > start_proc_args /usr/bin/true
> > stop_proc_args /usr/bin/true
> > allocation_rule $pe_slots
>
> $pe_slots means that all slots must come from one and the same machine (e.g. for smp jobs). You can try $round_robin.
>
> -- Reuti
>
>
> > control_slaves TRUE
> > job_is_first_task FALSE
> > urgency_slots min
> > accounting_summary TRUE
> >
> > this is the queue
> > qconf -sq cola.q
> > qname cola.q
> > hostlist @allhosts
> > seq_no 0
> > load_thresholds np_load_avg=1.75
> > suspend_thresholds NONE
> > nsuspend 1
> > suspend_interval 00:05:00
> > priority 0
> > min_cpu_interval 00:05:00
> > processors UNDEFINED
> > qtype BATCH INTERACTIVE
> > ckpt_list NONE
> > pe_list make pempi
> > rerun FALSE
> > slots 2
> > tmpdir /tmp
> > shell /bin/csh
> >
> > i noticed that if i put 2 slots (since the queue has 2 slots) on the -pe pempi N argument and also the full path to mpirun as you guys pointed, it works!!!
> > cristobal_at_neoideo:~$ qrsh -verbose -pe pempi 2 /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> > Your job 125 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> > Your interactive job 125 has been successfully scheduled.
> > Establishing builtin session to host ijorge.local ...
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > cristobal_at_neoideo:~$ qrsh -verbose -pe pempi 2 /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> > Your job 126 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> > Your interactive job 126 has been successfully scheduled.
> > Establishing builtin session to host neoideo ...
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > cristobal_at_neoideo:~$
> >
> > i just wonder why i didnt get mixed hostnames? like
> > neoideo
> > neoideo
> > ijorge.local
> > ijorge.local
> > neoideo
> > ijorge.local
> >
> > ??
> >
> > thanks for the help already!!!
> >
> > Cristobal
> >
> >
> >
> >
> > On Fri, Apr 9, 2010 at 8:58 AM, Huynh Thuc Cuoc <htcuoc_at_[hidden]> wrote:
> > Dear friend,
> > 1.
> > I prefer to use sge qsub cmd, for examples:
> >
> > [huong_at_ioitg2 MyPhylo]$ qsub -pe orte 3 myphylo.qsub
> > Your job 35 ("myphylo.qsub") has been submitted
> > [huong_at_ioitg2 MyPhylo]$ qstat
> > job-ID prior name user state submit/start at queue slots ja-task-ID
> > -----------------------------------------------------------------------------------------------------------------
> > 35 0.55500 myphylo.qs huong r 04/09/2010 19:28:59 all.q_at_[hidden] 3
> > [huong_at_ioitg2 MyPhylo]$ qstat
> > [huong_at_ioitg2 MyPhylo]$
> >
> > This job is running on node2 of my cluster.
> > My softs as following:
> > headnode: 4 CPUs. $GRAM, CentOS 5.4 + sge 6.2u4 (qmaster and also execd host) + openmpi 1.4.1
> > nodes 4CPUs, 1GRAM, CentOS 5.4 + sgeexecd + openmpi1.4.1
> > PE=orte and set to 4 slots.
> > The app myphylo.qsub has the long cmd in the shell:
> > /opt/openmpi/bin/mpirun -np 10 $HOME/MyPhylo/bin/par-phylo-builder --data . . . .
> > Try to set PE as orte, use default PE = make instead.
> >
> > 2. I test your cmd on my sytem as:
> > a.
> > [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe make mpirun -np 6 hostname
> > error: Numerical value invalid!
> > The initial portion of string "mpirun" contains no decimal number
> > [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 mpirun -np 6 hostname
> > Your job 36 ("mpirun") has been submitted
> >
> > waiting for interactive job to be scheduled ...
> > Your interactive job 36 has been successfully scheduled.
> > Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> > bash: mpirun: command not found
> > [huong_at_ioitg2 MyPhylo]$
> >
> > ERROR ! So I try:
> > [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 /opt/openmpi/bin/mpirun -np 6 hostname
> > Your job 38 ("mpirun") has been submitted
> >
> > waiting for interactive job to be scheduled ...
> > Your interactive job 38 has been successfully scheduled.
> > Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > [huong_at_ioitg2 MyPhylo]$
> >
> > This OK.
> > What is: the PATH points to where mpirun is located.
> >
> > TRY.
> >
> > Good chance
> > HT Cuoc
> >
> >
> > On Fri, Apr 9, 2010 at 11:02 AM, Cristobal Navarro <axischire_at_[hidden]> wrote:
> > Hello,
> >
> > after some days of work and testing, i managed to install SGE on two machines, also installed openMPI 1.4.1 for each one.
> >
> > SGE is working, i can submit jobs and it schedules the jobs to the available cores total of 6,
> >
> > my problem is that im trying to run an openMPI job and i cant.
> >
> > this is an example of what i am trying.
> >
> >
> > $qrsh -verbose -pe pempi mpirun -np 6 hostname
> > Your job 105 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> >
> > Your "qrsh" request could not be scheduled, try again later.
> >
> > im not sure what this can be,
> > in the ompi_info i have gridengine support.
> >
> > where do you recommend to look ??
> > thanks in advance
> >
> > Cristobal
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users