Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] run openMPI jobs with SGE,
From: Reuti (reuti_at_[hidden])
Date: 2010-04-12 05:06:07


Hi,

Am 09.04.2010 um 23:48 schrieb Cristobal Navarro:

> Thanks,
> now i get mixed results and everything seems to be working ok with mixed mpi xecution
>
> is it normal that after receiving the results, the hosts remain busy like 15 seconds ??
> example

yes. This is the time SGE needs for housekeeping, ist can even take some minutes (especially if you kill a parallel job).

-- Reuti

> master:common master$ qrsh -verbose -pe orte 10 /opt/openmpi-1.4.1/bin/mpirun -np 10 hostname
> Your job 65 ("mpirun") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 65 has been successfully scheduled.
> Establishing builtin session to host worker00.local ...
> worker00.local
> worker00.local
> worker00.local
> worker00.local
> worker00.local
> master.local
> master.local
> master.local
> master.local
> master.local
> #after some seconds, i query the hosts status and slots are still used
> master:common master$ qstat -f
> queuename qtype resv/used/tot. load_avg arch states
> ---------------------------------------------------------------------------------
> all.q_at_master.local BIP 0/5/16 0.02 darwin-x86
> 65 0.55500 mpirun master r 04/09/2010 17:44:36 5
> ---------------------------------------------------------------------------------
> all.q_at_worker00.local BIP 0/5/16 0.01 darwin-x86
> 65 0.55500 mpirun master r 04/09/2010 17:44:36 5
> master:common master$
>
> but after waiting more time, they get free again
> master:common master$ qstat -f
> queuename qtype resv/used/tot. load_avg arch states
> ---------------------------------------------------------------------------------
> all.q_at_master.local BIP 0/0/16 0.01 darwin-x86
> ---------------------------------------------------------------------------------
> all.q_at_worker00.local BIP 0/0/16 0.01 darwin-x86
>
> anyways these are just details, thanks to your help the important aspects are working.
> Cristobal
>
>
>
>
> On Fri, Apr 9, 2010 at 1:34 PM, Reuti <reuti_at_[hidden]> wrote:
> Am 09.04.2010 um 18:57 schrieb Cristobal Navarro:
>
> > sorry the command was missing a number
> >
> > as you said it should be
> >
> > qrsh -verbose -pe pempi 6 mpirun -np 6 hostname
> > waiting for interactive job to be scheduled ...
> >
> > Your "qrsh" request could not be scheduled, try again later.
> > ---
> > this is my parallel enviroment
> > qconf -sp pempi
> > pe_name pempi
> > slots 210
> > user_lists NONE
> > xuser_lists NONE
> > start_proc_args /usr/bin/true
> > stop_proc_args /usr/bin/true
> > allocation_rule $pe_slots
>
> $pe_slots means that all slots must come from one and the same machine (e.g. for smp jobs). You can try $round_robin.
>
> -- Reuti
>
>
> > control_slaves TRUE
> > job_is_first_task FALSE
> > urgency_slots min
> > accounting_summary TRUE
> >
> > this is the queue
> > qconf -sq cola.q
> > qname cola.q
> > hostlist @allhosts
> > seq_no 0
> > load_thresholds np_load_avg=1.75
> > suspend_thresholds NONE
> > nsuspend 1
> > suspend_interval 00:05:00
> > priority 0
> > min_cpu_interval 00:05:00
> > processors UNDEFINED
> > qtype BATCH INTERACTIVE
> > ckpt_list NONE
> > pe_list make pempi
> > rerun FALSE
> > slots 2
> > tmpdir /tmp
> > shell /bin/csh
> >
> > i noticed that if i put 2 slots (since the queue has 2 slots) on the -pe pempi N argument and also the full path to mpirun as you guys pointed, it works!!!
> > cristobal_at_neoideo:~$ qrsh -verbose -pe pempi 2 /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> > Your job 125 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> > Your interactive job 125 has been successfully scheduled.
> > Establishing builtin session to host ijorge.local ...
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > cristobal_at_neoideo:~$ qrsh -verbose -pe pempi 2 /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> > Your job 126 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> > Your interactive job 126 has been successfully scheduled.
> > Establishing builtin session to host neoideo ...
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > cristobal_at_neoideo:~$
> >
> > i just wonder why i didnt get mixed hostnames? like
> > neoideo
> > neoideo
> > ijorge.local
> > ijorge.local
> > neoideo
> > ijorge.local
> >
> > ??
> >
> > thanks for the help already!!!
> >
> > Cristobal
> >
> >
> >
> >
> > On Fri, Apr 9, 2010 at 8:58 AM, Huynh Thuc Cuoc <htcuoc_at_[hidden]> wrote:
> > Dear friend,
> > 1.
> > I prefer to use sge qsub cmd, for examples:
> >
> > [huong_at_ioitg2 MyPhylo]$ qsub -pe orte 3 myphylo.qsub
> > Your job 35 ("myphylo.qsub") has been submitted
> > [huong_at_ioitg2 MyPhylo]$ qstat
> > job-ID prior name user state submit/start at queue slots ja-task-ID
> > -----------------------------------------------------------------------------------------------------------------
> > 35 0.55500 myphylo.qs huong r 04/09/2010 19:28:59 all.q_at_[hidden] 3
> > [huong_at_ioitg2 MyPhylo]$ qstat
> > [huong_at_ioitg2 MyPhylo]$
> >
> > This job is running on node2 of my cluster.
> > My softs as following:
> > headnode: 4 CPUs. $GRAM, CentOS 5.4 + sge 6.2u4 (qmaster and also execd host) + openmpi 1.4.1
> > nodes 4CPUs, 1GRAM, CentOS 5.4 + sgeexecd + openmpi1.4.1
> > PE=orte and set to 4 slots.
> > The app myphylo.qsub has the long cmd in the shell:
> > /opt/openmpi/bin/mpirun -np 10 $HOME/MyPhylo/bin/par-phylo-builder --data . . . .
> > Try to set PE as orte, use default PE = make instead.
> >
> > 2. I test your cmd on my sytem as:
> > a.
> > [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe make mpirun -np 6 hostname
> > error: Numerical value invalid!
> > The initial portion of string "mpirun" contains no decimal number
> > [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 mpirun -np 6 hostname
> > Your job 36 ("mpirun") has been submitted
> >
> > waiting for interactive job to be scheduled ...
> > Your interactive job 36 has been successfully scheduled.
> > Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> > bash: mpirun: command not found
> > [huong_at_ioitg2 MyPhylo]$
> >
> > ERROR ! So I try:
> > [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 /opt/openmpi/bin/mpirun -np 6 hostname
> > Your job 38 ("mpirun") has been submitted
> >
> > waiting for interactive job to be scheduled ...
> > Your interactive job 38 has been successfully scheduled.
> > Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > ioitg2.ioit-grid.ac.vn
> > [huong_at_ioitg2 MyPhylo]$
> >
> > This OK.
> > What is: the PATH points to where mpirun is located.
> >
> > TRY.
> >
> > Good chance
> > HT Cuoc
> >
> >
> > On Fri, Apr 9, 2010 at 11:02 AM, Cristobal Navarro <axischire_at_[hidden]> wrote:
> > Hello,
> >
> > after some days of work and testing, i managed to install SGE on two machines, also installed openMPI 1.4.1 for each one.
> >
> > SGE is working, i can submit jobs and it schedules the jobs to the available cores total of 6,
> >
> > my problem is that im trying to run an openMPI job and i cant.
> >
> > this is an example of what i am trying.
> >
> >
> > $qrsh -verbose -pe pempi mpirun -np 6 hostname
> > Your job 105 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> >
> > Your "qrsh" request could not be scheduled, try again later.
> >
> > im not sure what this can be,
> > in the ompi_info i have gridengine support.
> >
> > where do you recommend to look ??
> > thanks in advance
> >
> > Cristobal
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users