Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun is using one PBS node only
From: Gus Correa (gus_at_[hidden])
Date: 2009-12-01 17:07:58


Hi Belaid Moa

Belaid MOA wrote:
> Thanks a lot Gus for you help again. I only have one CPU per node.
> The -n X option (no matter what the value of X is) shows X processes
> running on one node only (the other one is free).

So, somehow it is oversubscribing your single processor
on the first node.

A simple diagnostic:

Have you tried to run "hostname" on the two nodes through Torque/PBS
and mpiexec?

[PBS directives, cd $PBS_O_WORKDIR, etc]
...
/full/path/to/openmpi/bin/mpiexec -n 2 hostname

Try also with the -byslot and -bynode options.

> If I add the machinefile option with WN1 and WN2 in it, the right
> behavior is manifested. According to the documentation,
> mpirun should get the PBS_NODEFILE automatically from the PBS.

Yes, if you compiled OpenMPI you are using with Torque ("tm) support.
Did you?
Make sure the it has tm support.
Run "ompi_info" with full path if needed, to check that.
Are you sure the correct path to what you want is
/usr/local/bin/mpirun ?
Linux distributions, compilers, and other tools come with their
mpiexec and put them in places that you may not suspect, to better
double check you get what you want.
That has been a source of repeated confusion on this and other
mailing lists.

Also, make sure that passwordless ssh across the nodes is working.

Yet another thing to check, for easy name resolution,
your /etc/hosts file on *all*
nodes including the headnode should
have a list of all nodes and their IP addresses.
Something like this:

127.0.0.1 localhost.localdomain localhost
192.168.0.1 WN1
192.168.0.2 WN2

(The IPs above are guesswork of mine, you know better which to use.)

> So, I do
> not need to use machinefile.
>

True assuming the first condition above (OpenMPI *with* "tm" suport).

> Any ideas?
>

Yes, and I sent it to you on my last email!
Try the "-bynode" option of mpiexec.
("man mpiexec" is your friend!)

> Thanks a lot in advance.
> ~Belaid.
>

Best of luck!
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

PS - Your web site link to Paul Krugman is out of date.
Here are one to his (active) blog,
and another to his (no longer updated) web page: :)

http://krugman.blogs.nytimes.com/
http://www.princeton.edu/~pkrugman/

>
> > Date: Tue, 1 Dec 2009 15:42:30 -0500
> > From: gus_at_[hidden]
> > To: users_at_[hidden]
> > Subject: Re: [OMPI users] mpirun is using one PBS node only
> >
> > Hi Belaid Moa
> >
> > Belaid MOA wrote:
> > > Hi everyone,
> > > Here is another elementary question. I tried the following steps found
> > > in the FAQ section of www.open-mpi.org with a simple hello world
> example
> > > (with PBS/torque):
> > > $ qsub -l nodes=2 my_script.sh
> > >
> > > my_script.sh is pasted below:
> > > ========================
> > > #!/bin/sh -l
> > > #PBS -N helloTest
> > > #PBS -j eo
> > > echo `cat $PBS_NODEFILE` # shows two nodes: WN1 WN2
> > > cd $PBS_O_WORKDIR
> > > /usr/local/bin/mpirun hello
> > > ========================
> > >
> > > When the job is submitted, only one process is ran. When I add the
> -n 2
> > > option to the mpirun command,
> > > two processes are ran but on one node only.
> >
> > Do you have a single CPU/core per node?
> > Or are they multi-socket/multi-core?
> >
> > Check "man mpiexec" for the options that control on which nodes and
> > slots, etc your program will run.
> > ("Man mpiexec" will tell you more than I possibly can.)
> >
> > The default option is "-byslot",
> > which will use all "slots" (actually cores
> > or CPUs) available on a node before it moves to the next node.
> > Reading your question and your surprise with the result,
> > I would guess what you want is "-bynode" (not the default).
> >
> > Also, if you have more than one CPU/core per node,
> > you need to put this information in your Torque/PBS "nodes" file
> > (and restart your pbs_server daemon).
> > Something like this (for 2 CPUs/cores per node):
> >
> > WN1 np=2
> > WN2 np=2
> >
> > I hope this helps,
> > Gus Correa
> > ---------------------------------------------------------------------
> > Gustavo Correa
> > Lamont-Doherty Earth Observatory - Columbia University
> > Palisades, NY, 10964-8000 - USA
> > ---------------------------------------------------------------------
> >
> >
> > > Note that echo `cat
> > > $PBS_NODEFILE` outputs
> > > the two nodes I am using: WN1 and WN2.
> > >
> > > The output from ompi_info is shown below:
> > >
> > > $ ompi_info | grep tm
> > > MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.3)
> > > MCA ras: tm (MCA v2.0, API v2.0, Component v1.3.3)
> > > MCA plm: tm (MCA v2.0, API v2.0, Component v1.3.3)
> > >
> > > Any help on why openMPI/mpirun is using only one PBS node is very
> > > appreciated.
> > >
> > > Thanks a lot in advance and sorry for bothering you guys with my
> > > elementary questions!
> > >
> > > ~Belaid.
> > >
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > > Windows Live: Keep your friends up to date with what you do online.
> > > <http://go.microsoft.com/?linkid=9691810>
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ------------------------------------------------------------------------
> Windows Live: Keep your friends up to date with what you do online.
> <http://go.microsoft.com/?linkid=9691810>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users