Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI program cannot complete
From: Ashley Pittman (ashley_at_[hidden])
Date: 2010-10-25 15:40:18


On 25 Oct 2010, at 20:18, Jack Bryan wrote:

> Thanks
> I have downloaded
> http://padb.googlecode.com/files/padb-3.0.tgz
>
> and compile it.
>
> But, no user manual, I can not use it by padb -aQ.

The -a flag is a shortcut to all jobs, if you are providing a jobid (which is normally numeric) then don't set the -a flag.

> Do you have use manual about how to use it ?

In my previous mail I was assuming you were using orte to launch the jobs but if you are using PBS then you'll need to use the 3.2 beta as the PBS code is new, alternatively you could find the host where the PBS script itself runs and check of the "ompi-ps" command gives you any output, if it does then you could run it from there giving it the orte jobid.

A bit of background about resource managers (in which I'm including orte and PBS), padb supports many resource managers and tries to automatically detect which ones you have installed on your system. If you don't specify one then it'll see what is installed, if there is more than one resource manager installed then it'll see which of them claim to have active jobs - if only one resource manager meets this criteria then it'll pick that one - hence 99% of the time it should just work. If more than one resource manager claims to have active jobs then padb will refuse to run but ask the user to specify one explicitly.

You should try the following in order once you have 3.2 installed.

padb -Ormgr=pbs -Q <myjob>

Or - find the node where the PBS script is being executed, check that the ompi-ps command is returning the jobid and then run

padb -Ormgr=orte -Q <openmpi_jobid>

Ashley,

-- 
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk