Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running openMPI job with torque
From: Gus Correa (gus_at_[hidden])
Date: 2010-06-09 18:42:32


Hi Govind

Govind Songara wrote:
> Hi Gus,
> OpenMPI was not built with tm support.
> The submission/execution hosts does not have any of the
> PBS environment variable set
> PBS_O_WORKDIR, $PBS_NODEFILE.
> How i can make set it
> regards
> Govind
>

I missed the final part of your message,
about the Torque environment.

This is now more of a Torque question,
and you may want to ask it in the Torque mailing list:

http://www.supercluster.org/mailman/listinfo/torqueusers

The Torque system administration guide may also help:

http://www.clusterresources.com/products/torque/docs/

Anyway, you may not have configured Torque.
However, how did you figure out that the PBS_O_WORKDIR
and PBS_NODEFILE are not set?

Try to put "ls $PBS_O_WORKDIR" and "cat $PBS_NODEFILE" in your
Torque/PBS script. You can even comment out the mpirun command,
just to test the Torque environment.

I think Torque sets them for each job, PBS_NODEFILE depends on
how many nodes and processors you requested,
and PBS_O_WORKDIR is just your work directory,
from where you launched the job with qsub.

Assuming your Torque is in /var/spool/torque (it may be different
in your system), on the head node, does the file
/var/spool/torque/server_priv/nodes list all your nodes,
with the correct number of processors?

It should look somewhat like this
("np" is the total number of 'cores' on each node):

node01 np=2
node02 np=2
...

I hope this helps.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

  On 9 June 2010 18:45, Gus Correa <gus_at_[hidden]
> <mailto:gus_at_[hidden]>> wrote:
>
> Hi Govind
>
> Besides what Ralph said, make sure your OpenMPI was
> built with Torque ("tm") support.
>
> Suggestion:
> Do:
>
> ompi_info --all | grep tm
>
> It should show lines like these:
>
> MCA ras: tm (MCA v2.0, API v2.0, Component v1.4.2)
> MCA plm: tm (MCA v2.0, API v2.0, Component v1.4.2)
> ...
>
> ***
>
> If your OpenMPI doesn't have torque support,
> you may need to add the nodes list to your mpirun command.
>
> Suggestion:
>
> /usr/lib64/openmpi/1.4-gcc/bin/mpirun -hostfile $PBS_NODEFILE -np 4
> ./hello
>
> ***
>
> Also, assuming your OpenMPI has torque support:
>
> Did you request 4 nodes from torque?
>
> If you don't request the nodes and processors,
> torque will give you the default values
> (which may be one processor and one node).
>
> Suggestion:
>
> A script like this (adjusted to your site), tcsh style here,
> say, called run_my_pbs_job.tcsh:
>
> *********
>
> #! /bin/tcsh
> #PBS -l nodes=4:ppn=1
> #PBS -q default_at_your.torque.server
> #PBS -N myjob
> cd $PBS_O_WORKDIR
> /usr/lib64/openmpi/1.4-gcc/bin/mpirun -np 4 ./hello
>
> *********
>
> Then do:
> qsub run_my_pbs_job.tcsh
>
> **
>
> You can get more information about the PBS syntax using "man qsub".
>
> **
>
> I hope this helps,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
> Ralph Castain wrote:
>
>
> On Jun 9, 2010, at 10:00 AM, Govind Songara wrote:
>
> Thanks Ralph after giving full path of hello it runs.
> But it run only on one rank
> Hello World! from process 0 out of 1 on node56.beowulf.cluster
>
>
> Just to check things out, I would do:
>
> mpirun --display-allocation --display-map -np 4 ....
>
> That should show you the allocation and where OMPI is putting
> the procs.
>
> there also a error
> >cat my-script.sh.e43
> stty: standard input: Invalid argument
>
>
> Not really sure here - must be an error in the script itself.
>
>
>
>
> On 9 June 2010 16:46, Ralph Castain <rhc_at_[hidden]
> <mailto:rhc_at_[hidden]> <mailto:rhc_at_[hidden]
> <mailto:rhc_at_[hidden]>>> wrote:
>
> You need to include the path to "hello" unless it sits in
> your
> PATH environment!
>
> On Jun 9, 2010, at 9:37 AM, Govind wrote:
>
>
> #!/bin/sh
> /usr/lib64/openmpi/1.4-gcc/bin/mpirun hello
>
>
> On 9 June 2010 16:21, David Zhang
> <solarbikedz_at_[hidden] <mailto:solarbikedz_at_[hidden]>
> <mailto:solarbikedz_at_[hidden]
> <mailto:solarbikedz_at_[hidden]>>> wrote:
>
> what does your my-script.sh looks like?
>
> On Wed, Jun 9, 2010 at 8:17 AM, Govind
> <govind.rhul_at_[hidden] <mailto:govind.rhul_at_[hidden]>
> <mailto:govind.rhul_at_[hidden]
> <mailto:govind.rhul_at_[hidden]>>> wrote:
>
> Hi,
>
> I have installed following openMPI packge on
> worker node
> from repo
> openmpi-libs-1.4-4.el5.x86_64
> openmpi-1.4-4.el5.x86_64
> mpitests-openmpi-3.0-2.el5.x86_64
> mpi-selector-1.0.2-1.el5.noarch
>
> torque-client-2.3.6-2cri.el5.x86_64
> torque-2.3.6-2cri.el5.x86_64
> torque-mom-2.3.6-2cri.el5.x86_64
>
>
> Having some problem on running MPI jobs with
> torque
> qsub -q long -l nodes=4 my-script.sh
> 42.pbs1 <http://42.pbs1.pp.rhul.ac.uk/>
>
>
> cat my-script.sh.e41
> stty: standard input: Invalid argument
>
> --------------------------------------------------------------------------
> mpirun was unable to launch the specified
> application as
> it could not find an executable:
>
> Executable: hello
> Node: node56.beowulf.cluster
>
> while attempting to start process rank 0.
> ==================================
>
> I could run the binary directly on the node
> without any
> problem.
> mpiexec -n 4 hello
> Hello World! from process 2 out of 4 on
> node56.beowulf.cluster
> Hello World! from process 0 out of 4 on
> node56.beowulf.cluster
> Hello World! from process 3 out of 4 on
> node56.beowulf.cluster
> Hello World! from process 1 out of 4 on
> node56.beowulf.cluster
>
> Could you please advise, if I missing
> anything here.
>
>
> Regards
> Govind
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
> <mailto:users_at_[hidden]>>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> -- David Zhang
> University of California, San Diego
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users