Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running openMPI job with torque
From: Gus Correa (gus_at_[hidden])
Date: 2010-06-09 16:39:31


Hi Govind

It may work with the suggestion I sent you,
even with the OpenMPI with no Torque support that you have.

However, since you have Torque installed on your cluster,
it may be better to install OpenMPI from the source code tarball,
so as to have full Torque support built in, which is much more
convenient to use.

It is not really difficult to install OpenMPI from the source code.
It boils down to "configure, make, make install", preceded perhaps
by setting up a few environment variables, say, if you want to use
non-Gnu compilers, or to use a few configure switches, say,
if your Torque library is in a non-standard place.

The OpenMPI README file and FAQs walk you through the process,
and you can always post questions on this list if you have any problems.

Also, in a cluster that is not too big, the best way to install OpenMPI,
in my opinion, is on an NFS mounted directory, so as to make it visible
to all the nodes, without having to install it repeatedly on all nodes.
In general in a cluster the home directories are NFS mounted, but
you may have other choices in your system.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Gus Correa wrote:
> Hi Govind
>
> Govind Songara wrote:
> > Hi Gus,
> >
> > OpenMPI was not built with tm support.
> >
>
> I suspected that.
> Reading your postings, it seems to be an OpenMPI rpm from
> a Linux distribution, which I would guess are generic,
> and have no specific support for any resource manager like Torque.
>
> > The submission/execution hosts does not have any of the
> > PBS environment variable set
> > PBS_O_WORKDIR, $PBS_NODEFILE.
> > How i can make set it
> >
> > regards
> > Govind
> >
>
> I sent you a suggestion in my previous message!
> Here it is again:
>
> > If your OpenMPI doesn't have torque support,
> > you may need to add the nodes list to your mpirun command.
> >
> > Suggestion:
> >
> > /usr/lib64/openmpi/1.4-gcc/bin/mpirun -hostfile $PBS_NODEFILE -np 4
> > ./hello
> >
>
> I hope this helps.
>
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>
>
>> On 9 June 2010 18:45, Gus Correa <gus_at_[hidden]
>> <mailto:gus_at_[hidden]>> wrote:
>>
>> Hi Govind
>>
>> Besides what Ralph said, make sure your OpenMPI was
>> built with Torque ("tm") support.
>>
>> Suggestion:
>> Do:
>>
>> ompi_info --all | grep tm
>>
>> It should show lines like these:
>>
>> MCA ras: tm (MCA v2.0, API v2.0, Component v1.4.2)
>> MCA plm: tm (MCA v2.0, API v2.0, Component v1.4.2)
>> ...
>>
>> ***
>>
>> If your OpenMPI doesn't have torque support,
>> you may need to add the nodes list to your mpirun command.
>>
>> Suggestion:
>>
>> /usr/lib64/openmpi/1.4-gcc/bin/mpirun -hostfile $PBS_NODEFILE -np 4
>> ./hello
>>
>> ***
>>
>> Also, assuming your OpenMPI has torque support:
>>
>> Did you request 4 nodes from torque?
>>
>> If you don't request the nodes and processors,
>> torque will give you the default values
>> (which may be one processor and one node).
>>
>> Suggestion:
>>
>> A script like this (adjusted to your site), tcsh style here,
>> say, called run_my_pbs_job.tcsh:
>>
>> *********
>>
>> #! /bin/tcsh
>> #PBS -l nodes=4:ppn=1
>> #PBS -q default_at_your.torque.server
>> #PBS -N myjob
>> cd $PBS_O_WORKDIR
>> /usr/lib64/openmpi/1.4-gcc/bin/mpirun -np 4 ./hello
>>
>> *********
>>
>> Then do:
>> qsub run_my_pbs_job.tcsh
>>
>> **
>>
>> You can get more information about the PBS syntax using "man qsub".
>>
>> **
>>
>> I hope this helps,
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>> Ralph Castain wrote:
>>
>>
>> On Jun 9, 2010, at 10:00 AM, Govind Songara wrote:
>>
>> Thanks Ralph after giving full path of hello it runs.
>> But it run only on one rank
>> Hello World! from process 0 out of 1 on
>> node56.beowulf.cluster
>>
>>
>> Just to check things out, I would do:
>>
>> mpirun --display-allocation --display-map -np 4 ....
>>
>> That should show you the allocation and where OMPI is putting
>> the procs.
>>
>> there also a error
>> >cat my-script.sh.e43
>> stty: standard input: Invalid argument
>>
>>
>> Not really sure here - must be an error in the script itself.
>>
>>
>>
>>
>> On 9 June 2010 16:46, Ralph Castain <rhc_at_[hidden]
>> <mailto:rhc_at_[hidden]> <mailto:rhc_at_[hidden]
>> <mailto:rhc_at_[hidden]>>> wrote:
>>
>> You need to include the path to "hello" unless it sits in
>> your
>> PATH environment!
>>
>> On Jun 9, 2010, at 9:37 AM, Govind wrote:
>>
>>
>> #!/bin/sh
>> /usr/lib64/openmpi/1.4-gcc/bin/mpirun hello
>>
>>
>> On 9 June 2010 16:21, David Zhang
>> <solarbikedz_at_[hidden] <mailto:solarbikedz_at_[hidden]>
>> <mailto:solarbikedz_at_[hidden]
>> <mailto:solarbikedz_at_[hidden]>>> wrote:
>>
>> what does your my-script.sh looks like?
>>
>> On Wed, Jun 9, 2010 at 8:17 AM, Govind
>> <govind.rhul_at_[hidden] <mailto:govind.rhul_at_[hidden]>
>> <mailto:govind.rhul_at_[hidden]
>> <mailto:govind.rhul_at_[hidden]>>> wrote:
>>
>> Hi,
>>
>> I have installed following openMPI packge on
>> worker node
>> from repo
>> openmpi-libs-1.4-4.el5.x86_64
>> openmpi-1.4-4.el5.x86_64
>> mpitests-openmpi-3.0-2.el5.x86_64
>> mpi-selector-1.0.2-1.el5.noarch
>>
>> torque-client-2.3.6-2cri.el5.x86_64
>> torque-2.3.6-2cri.el5.x86_64
>> torque-mom-2.3.6-2cri.el5.x86_64
>>
>>
>> Having some problem on running MPI jobs with
>> torque
>> qsub -q long -l nodes=4 my-script.sh
>> 42.pbs1 <http://42.pbs1.pp.rhul.ac.uk/>
>>
>>
>> cat my-script.sh.e41
>> stty: standard input: Invalid argument
>>
>> --------------------------------------------------------------------------
>>
>> mpirun was unable to launch the specified
>> application as
>> it could not find an executable:
>>
>> Executable: hello
>> Node: node56.beowulf.cluster
>>
>> while attempting to start process rank 0.
>> ==================================
>>
>> I could run the binary directly on the node
>> without any
>> problem.
>> mpiexec -n 4 hello
>> Hello World! from process 2 out of 4 on
>> node56.beowulf.cluster
>> Hello World! from process 0 out of 4 on
>> node56.beowulf.cluster
>> Hello World! from process 3 out of 4 on
>> node56.beowulf.cluster
>> Hello World! from process 1 out of 4 on
>> node56.beowulf.cluster
>>
>> Could you please advise, if I missing
>> anything here.
>>
>>
>> Regards
>> Govind
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> <mailto:users_at_[hidden]> <mailto:users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>> -- David Zhang
>> University of California, San Diego
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users