Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun is using one PBS node only
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-12-01 18:59:52


You need to install with TM support on all nodes.

On Dec 1, 2009, at 6:08 PM, Belaid MOA wrote:

> I tried -bynode option but it did not change anything. I also tried
> the "hostname" name command and
> I keep getting only the name of one node repeated according to the -
> n value.
>
> Just to make sure I did the right installation, here is what I did:
>
> -- On the head node (HN), I installed openMPI using the --with-tm
> option as follows:
>
> ./configure --with-tm=/var/spool/torque --enable-static
> make install all
>
> -- On the worker nodes (WN1 and WN2), I installed openMPI without tm
> option as follows (it is a local installation on each worker node):
>
> ./configure --enable-static
> make install all
>
> Is this correct?
>
> Thanks a lot in advance.
> ~Belaid.
> > Date: Tue, 1 Dec 2009 17:07:58 -0500
> > From: gus_at_[hidden]
> > To: users_at_[hidden]
> > Subject: Re: [OMPI users] mpirun is using one PBS node only
> >
> > Hi Belaid Moa
> >
> > Belaid MOA wrote:
> > > Thanks a lot Gus for you help again. I only have one CPU per node.
> > > The -n X option (no matter what the value of X is) shows X
> processes
> > > running on one node only (the other one is free).
> >
> > So, somehow it is oversubscribing your single processor
> > on the first node.
> >
> > A simple diagnostic:
> >
> > Have you tried to run "hostname" on the two nodes through Torque/PBS
> > and mpiexec?
> >
> > [PBS directives, cd $PBS_O_WORKDIR, etc]
> > ...
> > /full/path/to/openmpi/bin/mpiexec -n 2 hostname
> >
> > Try also with the -byslot and -bynode options.
> >
> >
> > > If I add the machinefile option with WN1 and WN2 in it, the right
> > > behavior is manifested. According to the documentation,
> > > mpirun should get the PBS_NODEFILE automatically from the PBS.
> >
> > Yes, if you compiled OpenMPI you are using with Torque ("tm)
> support.
> > Did you?
> > Make sure the it has tm support.
> > Run "ompi_info" with full path if needed, to check that.
> > Are you sure the correct path to what you want is
> > /usr/local/bin/mpirun ?
> > Linux distributions, compilers, and other tools come with their
> > mpiexec and put them in places that you may not suspect, to better
> > double check you get what you want.
> > That has been a source of repeated confusion on this and other
> > mailing lists.
> >
> > Also, make sure that passwordless ssh across the nodes is working.
> >
> > Yet another thing to check, for easy name resolution,
> > your /etc/hosts file on *all*
> > nodes including the headnode should
> > have a list of all nodes and their IP addresses.
> > Something like this:
> >
> > 127.0.0.1 localhost.localdomain localhost
> > 192.168.0.1 WN1
> > 192.168.0.2 WN2
> >
> > (The IPs above are guesswork of mine, you know better which to use.)
> >
> > > So, I do
> > > not need to use machinefile.
> > >
> >
> > True assuming the first condition above (OpenMPI *with* "tm"
> suport).
> >
> > > Any ideas?
> > >
> >
> > Yes, and I sent it to you on my last email!
> > Try the "-bynode" option of mpiexec.
> > ("man mpiexec" is your friend!)
> >
> > > Thanks a lot in advance.
> > > ~Belaid.
> > >
> >
> > Best of luck!
> > Gus Correa
> >
> ---------------------------------------------------------------------
> > Gustavo Correa
> > Lamont-Doherty Earth Observatory - Columbia University
> > Palisades, NY, 10964-8000 - USA
> >
> ---------------------------------------------------------------------
> >
> > PS - Your web site link to Paul Krugman is out of date.
> > Here are one to his (active) blog,
> > and another to his (no longer updated) web page: :)
> >
> > http://krugman.blogs.nytimes.com/
> > http://www.princeton.edu/~pkrugman/
> >
> > >
> > > > Date: Tue, 1 Dec 2009 15:42:30 -0500
> > > > From: gus_at_[hidden]
> > > > To: users_at_[hidden]
> > > > Subject: Re: [OMPI users] mpirun is using one PBS node only
> > > >
> > > > Hi Belaid Moa
> > > >
> > > > Belaid MOA wrote:
> > > > > Hi everyone,
> > > > > Here is another elementary question. I tried the following
> steps found
> > > > > in the FAQ section of www.open-mpi.org with a simple hello
> world
> > > example
> > > > > (with PBS/torque):
> > > > > $ qsub -l nodes=2 my_script.sh
> > > > >
> > > > > my_script.sh is pasted below:
> > > > > ========================
> > > > > #!/bin/sh -l
> > > > > #PBS -N helloTest
> > > > > #PBS -j eo
> > > > > echo `cat $PBS_NODEFILE` # shows two nodes: WN1 WN2
> > > > > cd $PBS_O_WORKDIR
> > > > > /usr/local/bin/mpirun hello
> > > > > ========================
> > > > >
> > > > > When the job is submitted, only one process is ran. When I
> add the
> > > -n 2
> > > > > option to the mpirun command,
> > > > > two processes are ran but on one node only.
> > > >
> > > > Do you have a single CPU/core per node?
> > > > Or are they multi-socket/multi-core?
> > > >
> > > > Check "man mpiexec" for the options that control on which
> nodes and
> > > > slots, etc your program will run.
> > > > ("Man mpiexec" will tell you more than I possibly can.)
> > > >
> > > > The default option is "-byslot",
> > > > which will use all "slots" (actually cores
> > > > or CPUs) available on a node before it moves to the next node.
> > > > Reading your question and your surprise with the result,
> > > > I would guess what you want is "-bynode" (not the default).
> > > >
> > > > Also, if you have more than one CPU/core per node,
> > > > you need to put this information in your Torque/PBS "nodes" file
> > > > (and restart your pbs_server daemon).
> > > > Something like this (for 2 CPUs/cores per node):
> > > >
> > > > WN1 np=2
> > > > WN2 np=2
> > > >
> > > > I hope this helps,
> > > > Gus Correa
> > > >
> ---------------------------------------------------------------------
> > > > Gustavo Correa
> > > > Lamont-Doherty Earth Observatory - Columbia University
> > > > Palisades, NY, 10964-8000 - USA
> > > >
> ---------------------------------------------------------------------
> > > >
> > > >
> > > > > Note that echo `cat
> > > > > $PBS_NODEFILE` outputs
> > > > > the two nodes I am using: WN1 and WN2.
> > > > >
> > > > > The output from ompi_info is shown below:
> > > > >
> > > > > $ ompi_info | grep tm
> > > > > MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.3)
> > > > > MCA ras: tm (MCA v2.0, API v2.0, Component v1.3.3)
> > > > > MCA plm: tm (MCA v2.0, API v2.0, Component v1.3.3)
> > > > >
> > > > > Any help on why openMPI/mpirun is using only one PBS node is
> very
> > > > > appreciated.
> > > > >
> > > > > Thanks a lot in advance and sorry for bothering you guys
> with my
> > > > > elementary questions!
> > > > >
> > > > > ~Belaid.
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > > Windows Live: Keep your friends up to date with what you do
> online.
> > > > > <http://go.microsoft.com/?linkid=9691810>
> > > > >
> > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users_at_[hidden]
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> ------------------------------------------------------------------------
> > > Windows Live: Keep your friends up to date with what you do
> online.
> > > <http://go.microsoft.com/?linkid=9691810>
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Get a great deal on Windows 7 and see how it works the way you want.
> See the Windows 7 offers now.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]