Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI program cannot complete
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-10-25 13:51:01


Check the man page for qsub for proper use.

On Oct 25, 2010, at 1:49 PM, Jack Bryan wrote:

> thanks
>
> I use
> qsub -I nsga2_job.sh
> qsub: waiting for job 48270.clusterName to start
>
> By qstat
> I found the job name is none and no results show up.
>
> No shell prompt appear, the command line is hang there , no response.
>
> Any help is appreciated.
>
> Thanks
>
> Jack
>
> Oct. 25 2010
>
> > From: jsquyres_at_[hidden]
> > Date: Mon, 25 Oct 2010 13:39:30 -0400
> > To: users_at_[hidden]
> > Subject: Re: [OMPI users] Open MPI program cannot complete
> >
> > Can you use the interactive mode of PBS to get 5 cores on 1 node? IIRC, "qsub -I ..." ?
> >
> > Then you get a shell prompt with your allocated cores and can run stuff interactively. I don't know if your site allows this, but interactive debugging here might be *significantly* easier than try to automate some debugging.
> >
> >
> > On Oct 25, 2010, at 1:35 PM, Jack Bryan wrote:
> >
> > > thanks
> > >
> > > I have to use #PBS to submit any jobs in my cluster.
> > > I cannot use command line to hang a job on my cluster.
> > >
> > > this is my script:
> > > --------------------------------------
> > > #!/bin/bash
> > > #PBS -N jobname
> > > #PBS -l walltime=00:08:00,nodes=1
> > > #PBS -q queuename
> > > COMMAND=/mypath/myprog
> > > NCORES=5
> > >
> > > cd $PBS_O_WORKDIR
> > > NODES=`cat $PBS_NODEFILE | wc -l`
> > > NPROC=$(( $NCORES * $NODES ))
> > >
> > > mpirun -np $NPROC --mca btl self,sm,openib $COMMAND
> > >
> > > -------------------------------------------
> > >
> > > Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ?
> > > And how to get ZOMBIE_PID from the script ?
> > >
> > > Any help is appreciated.
> > >
> > > thanks
> > >
> > > Oct. 25 2010
> > >
> > > Date: Mon, 25 Oct 2010 19:24:35 +0200
> > > From: jed_at_[hidden]
> > > To: users_at_[hidden]
> > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > >
> > > On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustudy68_at_[hidden]> wrote:
> > > I need to use #PBS parallel job script to submit a job on MPI cluster.
> > >
> > > Is it not possible to reproduce locally? Most clusters have a way to submit an interactive job (which would let you start this thing and then inspect individual processes). Ashley's Padb suggestion will certainly be better in a non-interactive environment.
> > >
> > > Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ?
> > >
> > > Is control returning to your script after rank 0 has exited? In that case, you can just put this on the next line.
> > >
> > > How to get the ZOMBIE_PID ?
> > >
> > > "ps" from the command line, or getpid() from C code.
> > >
> > > Jed
> > >
> > > _______________________________________________ users mailing list users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/