Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI program cannot complete
From: Jack Bryan (dtustudy68_at_[hidden])
Date: 2010-10-26 01:55:53


thanksBut, I cannot see the attachment in the email. Would you please send me again ? and also copy to another my email:tomviewisu_at_yahoo.comthanksOct. 25 2010
From: dtustudy68_at_[hidden]
To: ashley_at_[hidden]
Subject: RE: [OMPI users] Open MPI program cannot complete
Date: Mon, 25 Oct 2010 16:53:32 -0600

thanks
But, I cannot see the attachment in the email.

Would you please send me again ?
and also copy to another my email:
tomviewisu_at_[hidden]
thanks
Oct. 25 2010

> Subject: Re: [OMPI users] Open MPI program cannot complete
> From: ashley_at_[hidden]
> Date: Mon, 25 Oct 2010 23:41:32 +0100
> To: dtustudy68_at_[hidden]
>
>
> Thanks, that's tells me a lot.
>
> Try the attached padb, I've added the patch for you and remove the -w option. Can you run it and send me back the output please.
>
> Ashley.
>
> On 25 Oct 2010, at 23:29, Jack Bryan wrote:
>
> > Thanks
> >
> > Here is the
> >
> > -bash-3.2$ qstat -fB
> > Server: clusterName
> > server_state = Active
> > scheduling = True
> > total_jobs = 26
> > state_count = Transit:0 Queued:7 Held:0 Waiting:0 Running:18 Exiting:0
> > acl_hosts = clustername
> > default_queue = normal
> > log_events = 511
> > mail_from = adm
> > query_other_jobs = True
> > resources_assigned.nodect = 246
> > scheduler_iteration = 600
> > node_check_rate = 150
> > tcp_timeout = 6
> > mom_job_sync = True
> > pbs_version = 2.4.2
> > keep_completed = 300
> > submit_hosts = clusterName
> > next_job_number = 48293
> > net_counter = 2 9 6
> >
> > -bash-3.2$ qstat -w -n
> > qstat: invalid option -- w
> >
> >
> > Which line should I put the
> > -----------------------------------------------------
> > --- padb (revision 401)
> > +++ padb (working copy)
> > @@ -2824,6 +2824,7 @@
> > foreach my $server (@servers) {
> > pbs_get_lqsub( $user, $server ); # get job list by qsub
> > }
> > + print Dumper \%pbs_tabjobs;
> > return \%pbs_tabjobs;
> > }
> > ----------------------------------------
> >
> > in the bin file padb
> >
> > Any help is appreciated.
> >
> > thanks
> >
> > Jack
> >
> > Oct. 25 2010
> >
> >
> >
> > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > > From: ashley_at_[hidden]
> > > Date: Mon, 25 Oct 2010 22:54:21 +0100
> > > To: dtustudy68_at_[hidden]
> > >
> > >
> > > [off list]
> > >
> > > The PBS support was added by a third-party so I've not used it in anger myself, it appears you are doing the correct thing as far as I can tell.
> > >
> > > Can you send me the output of the following two commands and also apply the patch below to padb (you can do this just in the bin dir - it's a perl script) and send me the output when you run that as well?
> > >
> > > qstat -fB
> > > qstat -w -n
> > >
> > > --- padb (revision 401)
> > > +++ padb (working copy)
> > > @@ -2824,6 +2824,7 @@
> > > foreach my $server (@servers) {
> > > pbs_get_lqsub( $user, $server ); # get job list by qsub
> > > }
> > > + print Dumper \%pbs_tabjobs;
> > > return \%pbs_tabjobs;
> > > }
> > >
> > > On 25 Oct 2010, at 22:30, Jack Bryan wrote:
> > >
> > > > Thanks
> > > >
> > > > I have downloaded
> > > > http://padb.googlecode.com/files/padb-3.2-beta1.tar.gz
> > > >
> > > > and followed the instructions of INSTALL file and installed it at /mypath/padb32
> > > >
> > > > But, I got:
> > > >
> > > > -bash-3.2$ padb -Ormgr=pbs -Q 48279.cluster
> > > > Job 48279.cluster is not active
> > > >
> > > > Actually, the job was running.
> > > >
> > > > I have installed
> > > > bin at
> > > >
> > > > /mypath/padb32/bin
> > > >
> > > >
> > > > libexec at
> > > > /lustre/jxding/padb32/libexec
> > > >
> > > > When I installed it, I used
> > > >
> > > > ./configure --prefix=/mypath/padb32
> > > >
> > > > I got
> > > > -----------------------------
> > > >
> > > > checking for a BSD-compatible install... /usr/bin/install -c
> > > > checking whether build environment is sane... yes
> > > > checking for a thread-safe mkdir -p... /bin/mkdir -p
> > > > checking for gawk... gawk
> > > > checking whether make sets $(MAKE)... yes
> > > > checking for gcc... gcc
> > > > checking whether the C compiler works... yes
> > > > checking for C compiler default output file name... a.out
> > > > checking for suffix of executables...
> > > > checking whether we are cross compiling... no
> > > > checking for suffix of object files... o
> > > > checking whether we are using the GNU C compiler... yes
> > > > checking whether gcc accepts -g... yes
> > > > checking for gcc option to accept ISO C89... none needed
> > > > checking for style of include used by make... GNU
> > > > checking dependency style of gcc... gcc3
> > > > checking whether gcc and cc understand -c and -o together... yes
> > > > configure: creating ./config.status
> > > > config.status: creating Makefile
> > > > config.status: creating src/Makefile
> > > > config.status: executing depfiles commands
> > > >
> > > > -------------------------------
> > > >
> > > > -bash-3.2$ make
> > > > Making all in src
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"padb\" -DVERSION=\"3.2-beta1\" -I. -Wall -g -O2 -MT minfo-minfo.o -MD -MP -MF .deps/minfo-minfo.Tpo -c -o minfo-minfo.o `test -f 'minfo.c' || echo './'`minfo.c
> > > > minfo.c: In function âfind_symâ:
> > > > minfo.c:158: warning: dereferencing type-punned pointer will break strict-aliasing rules
> > > > minfo.c: In function âmainâ:
> > > > minfo.c:649: warning: type-punning to incomplete type might break strict-aliasing rules
> > > > minfo.c:650: warning: type-punning to incomplete type might break strict-aliasing rules
> > > > mv -f .deps/minfo-minfo.Tpo .deps/minfo-minfo.Po
> > > > gcc -Wall -g -O2 -ldl -o minfo minfo-minfo.o
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[1]: Nothing to be done for `all-am'.
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1'
> > > > -------------------------------------------------
> > > >
> > > > -bash-3.2$ make install
> > > > Making install in src
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[2]: Entering directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > test -z "/lustre/jxding/padb32/bin" || /bin/mkdir -p "/mypath/padb32/bin"
> > > > /usr/bin/install -c padb '/lustre/jxding/padb32/bin'
> > > > test -z "/lustre/jxding/padb32/libexec" || /bin/mkdir -p "/mypath/padb32/libexec"
> > > > /usr/bin/install -c minfo '/lustre/jxding/padb32/libexec'
> > > > make[2]: Nothing to be done for `install-data-am'.
> > > > make[2]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[2]: Entering directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[2]: Nothing to be done for `install-exec-am'.
> > > > make[2]: Nothing to be done for `install-data-am'.
> > > > make[2]: Leaving directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1'
> > > > -bash-3.2$ make installcheck
> > > > Making installcheck in src
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Nothing to be done for `installcheck'.
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[1]: Nothing to be done for `installcheck-am'.
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1'
> > > > --------------------------------------------------
> > > >
> > > > Are there something wrong with what I have done ?
> > > >
> > > > Any help is appreciated.
> > > >
> > > > thanks
> > > >
> > > > Jack
> > > >
> > > > Oct. 25 2010
> > > >
> > > >
> > > > > From: ashley_at_[hidden]
> > > > > Date: Mon, 25 Oct 2010 20:40:18 +0100
> > > > > To: users_at_[hidden]
> > > > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > > > >
> > > > >
> > > > > On 25 Oct 2010, at 20:18, Jack Bryan wrote:
> > > > >
> > > > > > Thanks
> > > > > > I have downloaded
> > > > > > http://padb.googlecode.com/files/padb-3.0.tgz
> > > > > >
> > > > > > and compile it.
> > > > > >
> > > > > > But, no user manual, I can not use it by padb -aQ.
> > > > >
> > > > > The -a flag is a shortcut to all jobs, if you are providing a jobid (which is normally numeric) then don't set the -a flag.
> > > > >
> > > > > > Do you have use manual about how to use it ?
> > > > >
> > > > > In my previous mail I was assuming you were using orte to launch the jobs but if you are using PBS then you'll need to use the 3.2 beta as the PBS code is new, alternatively you could find the host where the PBS script itself runs and check of the "ompi-ps" command gives you any output, if it does then you could run it from there giving it the orte jobid.
> > > > >
> > > > > A bit of background about resource managers (in which I'm including orte and PBS), padb supports many resource managers and tries to automatically detect which ones you have installed on your system. If you don't specify one then it'll see what is installed, if there is more than one resource manager installed then it'll see which of them claim to have active jobs - if only one resource manager meets this criteria then it'll pick that one - hence 99% of the time it should just work. If more than one resource manager claims to have active jobs then padb will refuse to run but ask the user to specify one explicitly.
> > > > >
> > > > > You should try the following in order once you have 3.2 installed.
> > > > >
> > > > > padb -Ormgr=pbs -Q <myjob>
> > > > >
> > > > > Or - find the node where the PBS script is being executed, check that the ompi-ps command is returning the jobid and then run
> > > > >
> > > > > padb -Ormgr=orte -Q <openmpi_jobid>
> > > > >
> > > > > Ashley,
> > > > >
> > > > > --
> > > > >
> > > > > Ashley Pittman, Bath, UK.
> > > > >
> > > > > Padb - A parallel job inspection tool for cluster computing
> > > > > http://padb.pittman.org.uk
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users_at_[hidden]
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > > --
> > >
> > > Ashley Pittman, Bath, UK.
> > >
> > > Padb - A parallel job inspection tool for cluster computing
> > > http://padb.pittman.org.uk
> > >
>
> --
>
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
>