Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI program cannot complete
From: Gus Correa (gus_at_[hidden])
Date: 2010-10-25 14:08:54


Your job may be queued, not executing, because there are no
resources available, all nodes are busy.
Try qstat -a.

Posting a code snippet with all your MPI calls may prove effective.
You might get a trove of advice for a thrift of effort.

Jeff Squyres wrote:
> Check the man page for qsub for proper use.
>
>
> On Oct 25, 2010, at 1:49 PM, Jack Bryan wrote:
>
>> thanks
>>
>> I use
>> qsub -I nsga2_job.sh
>> qsub: waiting for job 48270.clusterName to start
>>
>> By qstat
>> I found the job name is none and no results show up.
>>
>> No shell prompt appear, the command line is hang there , no response.
>>
>> Any help is appreciated.
>>
>> Thanks
>>
>> Jack
>>
>> Oct. 25 2010
>>
>>> From: jsquyres_at_[hidden]
>>> Date: Mon, 25 Oct 2010 13:39:30 -0400
>>> To: users_at_[hidden]
>>> Subject: Re: [OMPI users] Open MPI program cannot complete
>>>
>>> Can you use the interactive mode of PBS to get 5 cores on 1 node? IIRC, "qsub -I ..." ?
>>>
>>> Then you get a shell prompt with your allocated cores and can run stuff interactively. I don't know if your site allows this, but interactive debugging here might be *significantly* easier than try to automate some debugging.
>>>
>>>
>>> On Oct 25, 2010, at 1:35 PM, Jack Bryan wrote:
>>>
>>>> thanks
>>>>
>>>> I have to use #PBS to submit any jobs in my cluster.
>>>> I cannot use command line to hang a job on my cluster.
>>>>
>>>> this is my script:
>>>> --------------------------------------
>>>> #!/bin/bash
>>>> #PBS -N jobname
>>>> #PBS -l walltime=00:08:00,nodes=1
>>>> #PBS -q queuename
>>>> COMMAND=/mypath/myprog
>>>> NCORES=5
>>>>
>>>> cd $PBS_O_WORKDIR
>>>> NODES=`cat $PBS_NODEFILE | wc -l`
>>>> NPROC=$(( $NCORES * $NODES ))
>>>>
>>>> mpirun -np $NPROC --mca btl self,sm,openib $COMMAND
>>>>
>>>> -------------------------------------------
>>>>
>>>> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ?
>>>> And how to get ZOMBIE_PID from the script ?
>>>>
>>>> Any help is appreciated.
>>>>
>>>> thanks
>>>>
>>>> Oct. 25 2010
>>>>
>>>> Date: Mon, 25 Oct 2010 19:24:35 +0200
>>>> From: jed_at_[hidden]
>>>> To: users_at_[hidden]
>>>> Subject: Re: [OMPI users] Open MPI program cannot complete
>>>>
>>>> On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustudy68_at_[hidden]> wrote:
>>>> I need to use #PBS parallel job script to submit a job on MPI cluster.
>>>>
>>>> Is it not possible to reproduce locally? Most clusters have a way to submit an interactive job (which would let you start this thing and then inspect individual processes). Ashley's Padb suggestion will certainly be better in a non-interactive environment.
>>>>
>>>> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ?
>>>>
>>>> Is control returning to your script after rank 0 has exited? In that case, you can just put this on the next line.
>>>>
>>>> How to get the ZOMBIE_PID ?
>>>>
>>>> "ps" from the command line, or getpid() from C code.
>>>>
>>>> Jed
>>>>
>>>> _______________________________________________ users mailing list users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>