Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun is using one PBS node only
From: Gus Correa (gus_at_[hidden])
Date: 2009-12-01 22:34:04


Hi Belaid

Belaid MOA wrote:
> You made my day Gus! Thank you very much.

I'm glad it helped.
I hope it is working for you now.

> If I asked before, I would
> have finished within two hours
> (but I guess that's part of the learning process).

Oh, well, that's nothing to worry about.
On these mailing list exchanges what often
takes longer is for one side to understand what the
other side is talking about.
Once the language is agreed upon, progress is fast.
With multiple sides (Ralph, Jeff, they're the real pros)
the conversation gets even more exciting and faster!

> Very straight
> forward! Although I tried doing exactly
> what you said, the Googled-information is not clear and sometimes
> misleading about what to install and where.
>

Yes, putting OpenMPI, executables, etc,
on head node NFS mounted directories
is the simplest way to achieve homogeneity and data visibility
across your working nodes.

> Thanks a lot Gus.
> ~Belaid.
>

Don't forget to fix your web site links to Paul Krugman! :)
Here are two that work:

http://krugman.blogs.nytimes.com/
http://www.princeton.edu/~pkrugman/

Best of luck,
Gus

>
>
>
>
>
> > Date: Tue, 1 Dec 2009 19:15:53 -0500
> > From: gus_at_[hidden]
> > To: users_at_[hidden]
> > Subject: Re: [OMPI users] mpirun is using one PBS node only
> >
> > Hi Belaid Moa
> >
> > Belaid MOA wrote:
> > > In that case, the way I installed it is not right. I thought that only
> > > the HN should be configured with the tm support
> > > not the worker nodes; the worker nodes only have the PBS daemon
> clients
> > > - No need for tm support on the worker nodes.
> > >
> > > When I ran ompi_info | grep tm on the worker nodes, the output is
> empty.
> > >
> >
> > Yes, it is clear that OpenMPI on your worker nodes
> > doesn't have "tm" support.
> > Again, I would guess this is the reason you can't get even hostname
> > to run on more than one node.
> >
> > Just reinstall OpemMPI with TM support on the head node
> > *on a NFS mounted directory*, and life will be much easier!
> > All nodes, head and worker, will see the same OpenMPI version.
> > It works very well for me here.
> > The only additional
> > thing you may need to do is to add the OpenMPI bin directory
> > to your PATH and the OpenMPI lib directory to LD_LIBRARY_PATH
> > on your .bashrc/.cshrc file (or on appropriate .csh and .sh files in
> > the /etc/profile.d directory).
> > Upgrades will be also much simpler.
> > The only disadvantage of this scheme may be on large clusters,
> > where scaling may bump on NFS limitations, but with only three
> > nodes that is certainly not your case.
> >
> >
> > > The information on the following link has mislead me then:
> > > http://www.physics.iitm.ac.in/~sanoop/linux_files/cluster.html
> > > (check OpenMPI Configuration section.)
> > >
> >
> > I suggest that you refer to the OpenMPI site instead.
> > That is the authoritative source of information about OpenMPI.
> > Their FAQs have a lot of information:
> > http://www.open-mpi.org/faq/
> > Likewise, the README file that comes with the OpenMPI tarball
> > is very clarifying.
> >
> > I hope this helps,
> > Gus Correa
> > ---------------------------------------------------------------------
> > Gustavo Correa
> > Lamont-Doherty Earth Observatory - Columbia University
> > Palisades, NY, 10964-8000 - USA
> > ---------------------------------------------------------------------
> >
> > > ~Belaid.
> > > > Date: Tue, 1 Dec 2009 18:36:15 -0500
> > > > From: gus_at_[hidden]
> > > > To: users_at_[hidden]
> > > > Subject: Re: [OMPI users] mpirun is using one PBS node only
> > > >
> > > > Hi Belaid Moa
> > > >
> > > > The OpenMPI I install and use is on a NFS mounted directory.
> > > > Hence, all the nodes see the same version, which has "tm" support.
> > > >
> > > > After reading your OpenMPI configuration parameters on the headnode
> > > > and working nodes (and the difference between them),
> > > > I would guess (just a guess) that the problem you see is because your
> > > > OpenMPI version on the nodes (probably) do not have Torque support.
> > > >
> > > > However, you should first verify that this is really the case,
> > > > because if the OpenMPI configure script
> > > > finds the torque libraries it will (probably) configure and
> > > > install OpenMPI with "tm" support, even if you don't ask it
> > > > explicitly on the working nodes.
> > > > Hence, ssh to WN1 or WN2 and do "ompi_info" to check this out first.
> > > >
> > > > If there is no Torque on WN1 and WN2 then OpenMPI won't find it
> > > > and you won't have "tm" support on the nodes.
> > > >
> > > > In any case, if OpenMPI "tm" support is missing on WN[1,2},
> > > > I would suggest that you reinstall OpenMPI on WN1 and WN2 *with tm
> > > support*.
> > > > This will require that you have Torque on the working nodes also,
> > > > and use the same configure command line that you used on the
> headnode.
> > > >
> > > > A low-tech alternative is to copy over your OpenMPI directory tree to
> > > > the WN1 and WN2 nodes.
> > > >
> > > > A yet simpler alternative is to reinstall OpenMPI on the headnode
> > > > on a NFS mounted directory (as I do here), then
> > > > add the corresponding "bin" path to your PATH,
> > > > and the corresponding "lib" path to your LD_LIBRARY_PATH environment
> > > > variables.
> > > >
> > > > Think about maintenance, and upgrades:
> > > > On an NFS mounted directory
> > > > you need to install only once, whereas the way you have it now
> you need
> > > > to do it N+1 times (or have a mechanism to propagate a single
> > > > installation from the head node to the compute nodes).
> > > >
> > > > NFS is your friend! :)
> > > >
> > > > I hope this helps,
> > > > Gus Correa
> > > > ---------------------------------------------------------------------
> > > > Gustavo Correa
> > > > Lamont-Doherty Earth Observatory - Columbia University
> > > > Palisades, NY, 10964-8000 - USA
> > > > ---------------------------------------------------------------------
> > > >
> > > >
> > > > Belaid MOA wrote:
> > > > > I tried -bynode option but it did not change anything. I also
> tried
> > > the
> > > > > "hostname" name command and
> > > > > I keep getting only the name of one node repeated according to
> the -n
> > > > > value.
> > > > >
> > > > > Just to make sure I did the right installation, here is what I did:
> > > > >
> > > > > -- On the head node (HN), I installed openMPI using the --with-tm
> > > option
> > > > > as follows:
> > > > >
> > > > > ./configure --with-tm=/var/spool/torque --enable-static
> > > > > make install all
> > > > >
> > > > > -- On the worker nodes (WN1 and WN2), I installed openMPI
> without tm
> > > > > option as follows (it is a local installation on each worker node):
> > > > >
> > > > > ./configure --enable-static
> > > > > make install all
> > > > >
> > > > > Is this correct?
> > > > >
> > > > > Thanks a lot in advance.
> > > > > ~Belaid.
> > > > > > Date: Tue, 1 Dec 2009 17:07:58 -0500
> > > > > > From: gus_at_[hidden]
> > > > > > To: users_at_[hidden]
> > > > > > Subject: Re: [OMPI users] mpirun is using one PBS node only
> > > > > >
> > > > > > Hi Belaid Moa
> > > > > >
> > > > > > Belaid MOA wrote:
> > > > > > > Thanks a lot Gus for you help again. I only have one CPU
> per node.
> > > > > > > The -n X option (no matter what the value of X is) shows X
> > > processes
> > > > > > > running on one node only (the other one is free).
> > > > > >
> > > > > > So, somehow it is oversubscribing your single processor
> > > > > > on the first node.
> > > > > >
> > > > > > A simple diagnostic:
> > > > > >
> > > > > > Have you tried to run "hostname" on the two nodes through
> Torque/PBS
> > > > > > and mpiexec?
> > > > > >
> > > > > > [PBS directives, cd $PBS_O_WORKDIR, etc]
> > > > > > ...
> > > > > > /full/path/to/openmpi/bin/mpiexec -n 2 hostname
> > > > > >
> > > > > > Try also with the -byslot and -bynode options.
> > > > > >
> > > > > >
> > > > > > > If I add the machinefile option with WN1 and WN2 in it, the
> right
> > > > > > > behavior is manifested. According to the documentation,
> > > > > > > mpirun should get the PBS_NODEFILE automatically from the PBS.
> > > > > >
> > > > > > Yes, if you compiled OpenMPI you are using with Torque ("tm)
> support.
> > > > > > Did you?
> > > > > > Make sure the it has tm support.
> > > > > > Run "ompi_info" with full path if needed, to check that.
> > > > > > Are you sure the correct path to what you want is
> > > > > > /usr/local/bin/mpirun ?
> > > > > > Linux distributions, compilers, and other tools come with their
> > > > > > mpiexec and put them in places that you may not suspect, to
> better
> > > > > > double check you get what you want.
> > > > > > That has been a source of repeated confusion on this and other
> > > > > > mailing lists.
> > > > > >
> > > > > > Also, make sure that passwordless ssh across the nodes is
> working.
> > > > > >
> > > > > > Yet another thing to check, for easy name resolution,
> > > > > > your /etc/hosts file on *all*
> > > > > > nodes including the headnode should
> > > > > > have a list of all nodes and their IP addresses.
> > > > > > Something like this:
> > > > > >
> > > > > > 127.0.0.1 localhost.localdomain localhost
> > > > > > 192.168.0.1 WN1
> > > > > > 192.168.0.2 WN2
> > > > > >
> > > > > > (The IPs above are guesswork of mine, you know better which
> to use.)
> > > > > >
> > > > > > > So, I do
> > > > > > > not need to use machinefile.
> > > > > > >
> > > > > >
> > > > > > True assuming the first condition above (OpenMPI *with* "tm"
> suport).
> > > > > >
> > > > > > > Any ideas?
> > > > > > >
> > > > > >
> > > > > > Yes, and I sent it to you on my last email!
> > > > > > Try the "-bynode" option of mpiexec.
> > > > > > ("man mpiexec" is your friend!)
> > > > > >
> > > > > > > Thanks a lot in advance.
> > > > > > > ~Belaid.
> > > > > > >
> > > > > >
> > > > > > Best of luck!
> > > > > > Gus Correa
> > > > > >
> ---------------------------------------------------------------------
> > > > > > Gustavo Correa
> > > > > > Lamont-Doherty Earth Observatory - Columbia University
> > > > > > Palisades, NY, 10964-8000 - USA
> > > > > >
> ---------------------------------------------------------------------
> > > > > >
> > > > > > PS - Your web site link to Paul Krugman is out of date.
> > > > > > Here are one to his (active) blog,
> > > > > > and another to his (no longer updated) web page: :)
> > > > > >
> > > > > > http://krugman.blogs.nytimes.com/
> > > > > > http://www.princeton.edu/~pkrugman/
> > > > > >
> > > > > > >
> > > > > > > > Date: Tue, 1 Dec 2009 15:42:30 -0500
> > > > > > > > From: gus_at_[hidden]
> > > > > > > > To: users_at_[hidden]
> > > > > > > > Subject: Re: [OMPI users] mpirun is using one PBS node only
> > > > > > > >
> > > > > > > > Hi Belaid Moa
> > > > > > > >
> > > > > > > > Belaid MOA wrote:
> > > > > > > > > Hi everyone,
> > > > > > > > > Here is another elementary question. I tried the following
> > > > > steps found
> > > > > > > > > in the FAQ section of www.open-mpi.org with a simple hello
> > > world
> > > > > > > example
> > > > > > > > > (with PBS/torque):
> > > > > > > > > $ qsub -l nodes=2 my_script.sh
> > > > > > > > >
> > > > > > > > > my_script.sh is pasted below:
> > > > > > > > > ========================
> > > > > > > > > #!/bin/sh -l
> > > > > > > > > #PBS -N helloTest
> > > > > > > > > #PBS -j eo
> > > > > > > > > echo `cat $PBS_NODEFILE` # shows two nodes: WN1 WN2
> > > > > > > > > cd $PBS_O_WORKDIR
> > > > > > > > > /usr/local/bin/mpirun hello
> > > > > > > > > ========================
> > > > > > > > >
> > > > > > > > > When the job is submitted, only one process is ran. When I
> > > add the
> > > > > > > -n 2
> > > > > > > > > option to the mpirun command,
> > > > > > > > > two processes are ran but on one node only.
> > > > > > > >
> > > > > > > > Do you have a single CPU/core per node?
> > > > > > > > Or are they multi-socket/multi-core?
> > > > > > > >
> > > > > > > > Check "man mpiexec" for the options that control on which
> > > nodes and
> > > > > > > > slots, etc your program will run.
> > > > > > > > ("Man mpiexec" will tell you more than I possibly can.)
> > > > > > > >
> > > > > > > > The default option is "-byslot",
> > > > > > > > which will use all "slots" (actually cores
> > > > > > > > or CPUs) available on a node before it moves to the next
> node.
> > > > > > > > Reading your question and your surprise with the result,
> > > > > > > > I would guess what you want is "-bynode" (not the default).
> > > > > > > >
> > > > > > > > Also, if you have more than one CPU/core per node,
> > > > > > > > you need to put this information in your Torque/PBS
> "nodes" file
> > > > > > > > (and restart your pbs_server daemon).
> > > > > > > > Something like this (for 2 CPUs/cores per node):
> > > > > > > >
> > > > > > > > WN1 np=2
> > > > > > > > WN2 np=2
> > > > > > > >
> > > > > > > > I hope this helps,
> > > > > > > > Gus Correa
> > > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > Gustavo Correa
> > > > > > > > Lamont-Doherty Earth Observatory - Columbia University
> > > > > > > > Palisades, NY, 10964-8000 - USA
> > > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > >
> > > > > > > >
> > > > > > > > > Note that echo `cat
> > > > > > > > > $PBS_NODEFILE` outputs
> > > > > > > > > the two nodes I am using: WN1 and WN2.
> > > > > > > > >
> > > > > > > > > The output from ompi_info is shown below:
> > > > > > > > >
> > > > > > > > > $ ompi_info | grep tm
> > > > > > > > > MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component
> v1.3.3)
> > > > > > > > > MCA ras: tm (MCA v2.0, API v2.0, Component v1.3.3)
> > > > > > > > > MCA plm: tm (MCA v2.0, API v2.0, Component v1.3.3)
> > > > > > > > >
> > > > > > > > > Any help on why openMPI/mpirun is using only one PBS node
> > > is very
> > > > > > > > > appreciated.
> > > > > > > > >
> > > > > > > > > Thanks a lot in advance and sorry for bothering you guys
> > > with my
> > > > > > > > > elementary questions!
> > > > > > > > >
> > > > > > > > > ~Belaid.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > > > > > > Windows Live: Keep your friends up to date with what
> you do
> > > online.
> > > > > > > > > <http://go.microsoft.com/?linkid=9691810>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > users mailing list
> > > > > > > > > users_at_[hidden]
> > > > > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > users mailing list
> > > > > > > > users_at_[hidden]
> > > > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > >
> > > > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > > > > Windows Live: Keep your friends up to date with what you do
> online.
> > > > > > > <http://go.microsoft.com/?linkid=9691810>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > users_at_[hidden]
> > > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > users_at_[hidden]
> > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > > Get a great deal on Windows 7 and see how it works the way you
> > > want. See
> > > > > the Windows 7 offers now. <http://go.microsoft.com/?linkid=9691813>
> > > > >
> > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users_at_[hidden]
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> ------------------------------------------------------------------------
> > > Windows Live: Keep your friends up to date with what you do online.
> > > <http://go.microsoft.com/?linkid=9691810>
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ------------------------------------------------------------------------
> Windows Live: Keep your friends up to date with what you do online.
> <http://go.microsoft.com/?linkid=9691810>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users