Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Open MPI program cannot complete
From: Gus Correa (gus_at_[hidden])
Date: 2010-10-25 14:08:54


Your job may be queued, not executing, because there are no
resources available, all nodes are busy.
Try qstat -a.

Posting a code snippet with all your MPI calls may prove effective.
You might get a trove of advice for a thrift of effort.

Jeff Squyres wrote:
> Check the man page for qsub for proper use.
>
>
> On Oct 25, 2010, at 1:49 PM, Jack Bryan wrote:
>
>> thanks
>>
>> I use
>> qsub -I nsga2_job.sh
>> qsub: waiting for job 48270.clusterName to start
>>
>> By qstat
>> I found the job name is none and no results show up.
>>
>> No shell prompt appear, the command line is hang there , no response.
>>
>> Any help is appreciated.
>>
>> Thanks
>>
>> Jack
>>
>> Oct. 25 2010
>>
>>> From: jsquyres_at_[hidden]
>>> Date: Mon, 25 Oct 2010 13:39:30 -0400
>>> To: users_at_[hidden]
>>> Subject: Re: [OMPI users] Open MPI program cannot complete
>>>
>>> Can you use the interactive mode of PBS to get 5 cores on 1 node? IIRC, "qsub -I ..." ?
>>>
>>> Then you get a shell prompt with your allocated cores and can run stuff interactively. I don't know if your site allows this, but interactive debugging here might be *significantly* easier than try to automate some debugging.
>>>
>>>
>>> On Oct 25, 2010, at 1:35 PM, Jack Bryan wrote:
>>>
>>>> thanks
>>>>
>>>> I have to use #PBS to submit any jobs in my cluster.
>>>> I cannot use command line to hang a job on my cluster.
>>>>
>>>> this is my script:
>>>> --------------------------------------
>>>> #!/bin/bash
>>>> #PBS -N jobname
>>>> #PBS -l walltime=00:08:00,nodes=1
>>>> #PBS -q queuename
>>>> COMMAND=/mypath/myprog
>>>> NCORES=5
>>>>
>>>> cd $PBS_O_WORKDIR
>>>> NODES=`cat $PBS_NODEFILE | wc -l`
>>>> NPROC=$(( $NCORES * $NODES ))
>>>>
>>>> mpirun -np $NPROC --mca btl self,sm,openib $COMMAND
>>>>
>>>> -------------------------------------------
>>>>
>>>> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ?
>>>> And how to get ZOMBIE_PID from the script ?
>>>>
>>>> Any help is appreciated.
>>>>
>>>> thanks
>>>>
>>>> Oct. 25 2010
>>>>
>>>> Date: Mon, 25 Oct 2010 19:24:35 +0200
>>>> From: jed_at_[hidden]
>>>> To: users_at_[hidden]
>>>> Subject: Re: [OMPI users] Open MPI program cannot complete
>>>>
>>>> On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustudy68_at_[hidden]> wrote:
>>>> I need to use #PBS parallel job script to submit a job on MPI cluster.
>>>>
>>>> Is it not possible to reproduce locally? Most clusters have a way to submit an interactive job (which would let you start this thing and then inspect individual processes). Ashley's Padb suggestion will certainly be better in a non-interactive environment.
>>>>
>>>> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ?
>>>>
>>>> Is control returning to your script after rank 0 has exited? In that case, you can just put this on the next line.
>>>>
>>>> How to get the ZOMBIE_PID ?
>>>>
>>>> "ps" from the command line, or getpid() from C code.
>>>>
>>>> Jed
>>>>
>>>> _______________________________________________ users mailing list users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>