Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] run openMPI jobs with SGE,
From: Cristobal Navarro (axischire_at_[hidden])
Date: 2010-04-09 12:57:30


sorry the command was missing a number

as you said it should be

qrsh -verbose -pe pempi 6 mpirun -np 6 hostname
waiting for interactive job to be scheduled ...

Your "qrsh" request could not be scheduled, try again later.

---
*this is my parallel enviroment*
*qconf -sp pempi*
pe_name            pempi
slots              210
user_lists         NONE
xuser_lists        NONE
start_proc_args    /usr/bin/true
stop_proc_args     /usr/bin/true
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE
*this is the queue
qconf -sq cola.q
*qname                 cola.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make pempi
rerun                 FALSE
slots                 2
tmpdir                /tmp
shell                 /bin/csh*
i noticed that if i put 2 slots (since the queue has 2 slots) on the -pe
pempi N   argument and also the full path to mpirun as you guys pointed, it
works!!!
*cristobal_at_neoideo:~$ qrsh -verbose -pe pempi 2
/opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
Your job 125 ("mpirun") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 125 has been successfully scheduled.
Establishing builtin session to host ijorge.local ...
ijorge.local
ijorge.local
ijorge.local
ijorge.local
ijorge.local
ijorge.local
cristobal_at_neoideo:~$ qrsh -verbose -pe pempi 2 /opt/openmpi-1.4.1/bin/mpirun
-np 6 hostname
Your job 126 ("mpirun") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 126 has been successfully scheduled.
Establishing builtin session to host neoideo ...
neoideo
neoideo
neoideo
neoideo
neoideo
neoideo
cristobal_at_neoideo:~$ *
**
i just wonder why i didnt get mixed hostnames? like
neoideo
neoideo
ijorge.local
ijorge.local
neoideo
ijorge.local
??
thanks for the help already!!!
*
Cristobal
On Fri, Apr 9, 2010 at 8:58 AM, Huynh Thuc Cuoc <htcuoc_at_[hidden]> wrote:
> Dear friend,
> 1.
> I prefer to use sge qsub cmd, for examples:
>
> [huong_at_ioitg2 MyPhylo]$ qsub -pe orte 3 myphylo.qsub
> Your job 35 ("myphylo.qsub") has been submitted
> [huong_at_ioitg2 MyPhylo]$ qstat
> job-ID  prior   name       user         state submit/start at
> queue                          slots ja-task-ID
>
> -----------------------------------------------------------------------------------------------------------------
>      35 0.55500 myphylo.qs huong        r     04/09/2010 19:28:59
> all.q_at_[hidden]        3
> [huong_at_ioitg2 MyPhylo]$ qstat
> [huong_at_ioitg2 MyPhylo]$
>
> This job is running on node2 of my cluster.
> My softs as following:
> headnode: 4 CPUs. $GRAM, CentOS 5.4 + sge 6.2u4 (qmaster and also execd
> host) + openmpi 1.4.1
> nodes 4CPUs, 1GRAM, CentOS 5.4 + sgeexecd + openmpi1.4.1
> PE=orte and set to 4 slots.
> The app myphylo.qsub has the long cmd in the shell:
> /opt/openmpi/bin/mpirun -np 10 $HOME/MyPhylo/bin/par-phylo-builder --data .
> . . .
> Try to set PE as orte, use default PE = make instead.
>
> 2. I test your cmd on my sytem as:
> a.
> [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe make mpirun -np 6 hostname
> error: Numerical value invalid!
> The initial portion of string "mpirun" contains no decimal number
> [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 mpirun -np 6 hostname
> Your job 36 ("mpirun") has been submitted
>
> waiting for interactive job to be scheduled ...
> Your interactive job 36 has been successfully scheduled.
> Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> bash: mpirun: command not found
> [huong_at_ioitg2 MyPhylo]$
>
> ERROR ! So I try:
> [huong_at_ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 /opt/openmpi/bin/mpirun
> -np 6 hostname
> Your job 38 ("mpirun") has been submitted
>
> waiting for interactive job to be scheduled ...
> Your interactive job 38 has been successfully scheduled.
> Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> [huong_at_ioitg2 MyPhylo]$
>
> This OK.
> What is: the PATH points to where mpirun is located.
>
> TRY.
>
> Good chance
> HT Cuoc
>
>
> On Fri, Apr 9, 2010 at 11:02 AM, Cristobal Navarro <axischire_at_[hidden]>wrote:
>
>> Hello,
>>
>> after some days of work and testing, i managed to install SGE on two
>> machines, also installed openMPI 1.4.1 for each one.
>>
>> SGE is working, i can submit jobs and it schedules the jobs to the
>> available cores total of 6,
>>
>> my problem is that im trying to run an openMPI job and i cant.
>>
>> this is an example of what i am trying.
>>
>
>
>>
>> $qrsh -verbose -pe pempi mpirun -np 6 hostname
>> Your job 105 ("mpirun") has been submitted
>> waiting for interactive job to be scheduled ...
>>
>> Your "qrsh" request could not be scheduled, try again later.
>>
>> im not sure what this can be,
>> in the ompi_info i have gridengine support.
>>
>> where do you recommend to look ??
>> thanks in advance
>>
>> Cristobal
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>