Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] torque pbs behaviour...
From: Gus Correa (gus_at_[hidden])
Date: 2009-08-10 17:47:58


Hi Jody, list

See comments inline.

Jody Klymak wrote:
>
> On Aug 10, 2009, at 13:01 PM, Gus Correa wrote:
>
>> Hi Jody
>>
>> We don't have Mac OS-X, but Linux, not sure if this applies to you.
>>
>> Did you configure your OpenMPI with Torque support,
>> and pointed to the same library that provides the
>> Torque you are using (--with-tm=/path/to/torque-library-directory)?
>
> Not explicitly. I'll check into that....

1) If you don't do it explicitly, configure will use the first libtorque
it finds (and that works I presume),
which may/may not be the one you want, if you have more than one.
If you only have one version of Torque installed,
this shouldn't be the problem.

2) Have you tried something very simple, like the examples/hello_c.c
program, to test the Torque-OpenMPI integration?

3) Also, just in case, put a "cat $PBS_NODEFILE" inside your script,
before mpirun, to see what it reports.
For "#PBS -l nodes=2:ppn=8"
it should show 16 lines, 8 with the name of each node.

4) Finally, just to make sure the syntax is right.
On your message you wrote:

>>> If I submit openMPI with:
>>> #PBS -l nodes=2:ppn=8
>>> mpirun MyProg

Is this the real syntax you used?

Or was it perhaps:

#PBS -l nodes=2:ppn=8
mpirun -n 16 MyProg

You need to tell mpirun how many processes to launch,
regardless of whether you are using Torque or not.

My $0.02

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

>
>
>> Are you using the right mpirun? (There are so many out there.)
>
> yeah - I use the explicit path and moved the OS X one.
>
> Thanks! Jody
>
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>> Jody Klymak wrote:
>>> Hi All,
>>> I've been trying to get torque pbs to work on my OS X 10.5.7 cluster
>>> with openMPI (after finding that Xgrid was pretty flaky about
>>> connections). I *think* this is an MPI problem (perhaps via operator
>>> error!)
>>> If I submit openMPI with:
>>> #PBS -l nodes=2:ppn=8
>>> mpirun MyProg
>>> pbs locks off two of the processors, checked via "pbsnodes -a", and
>>> the job output. But mpirun runs the whole job on the second of the
>>> two processors.
>>> If I run the same job w/o qsub (i.e. using ssh)
>>> mpirun -n 16 -host xserve01,xserve02 MyProg
>>> it runs fine on all the nodes....
>>> My /var/spool/toque/server_priv/nodes file looks like:
>>> xserve01.local np=8
>>> xserve02.local np=8
>>> Any idea what could be going wrong or how to debu this properly?
>>> There is nothing suspicious in the server or mom logs.
>>> Thanks for any help,
>>> Jody
>>> --
>>> Jody Klymak
>>> http://web.uvic.ca/~jklymak/
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Jody Klymak
> http://web.uvic.ca/~jklymak/
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users