Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] torque pbs behaviour...
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-08-10 18:30:30


Just to correct something said here.

> You need to tell mpirun how many processes to launch,
> regardless of whether you are using Torque or not.

This is not correct. If you don't tell mpirun how many processes to
launch, we will automatically launch one process for every slot in
your allocation. In the case described here, there were 16 slots
allocated, so we would automatically launch 16 processes.

Ralph

On Aug 10, 2009, at 3:47 PM, Gus Correa wrote:

> Hi Jody, list
>
> See comments inline.
>
> Jody Klymak wrote:
>> On Aug 10, 2009, at 13:01 PM, Gus Correa wrote:
>>> Hi Jody
>>>
>>> We don't have Mac OS-X, but Linux, not sure if this applies to you.
>>>
>>> Did you configure your OpenMPI with Torque support,
>>> and pointed to the same library that provides the
>>> Torque you are using (--with-tm=/path/to/torque-library-directory)?
>> Not explicitly. I'll check into that....
>
>
> 1) If you don't do it explicitly, configure will use the first
> libtorque
> it finds (and that works I presume),
> which may/may not be the one you want, if you have more than one.
> If you only have one version of Torque installed,
> this shouldn't be the problem.
>
> 2) Have you tried something very simple, like the examples/hello_c.c
> program, to test the Torque-OpenMPI integration?
>
> 3) Also, just in case, put a "cat $PBS_NODEFILE" inside your script,
> before mpirun, to see what it reports.
> For "#PBS -l nodes=2:ppn=8"
> it should show 16 lines, 8 with the name of each node.
>
> 4) Finally, just to make sure the syntax is right.
> On your message you wrote:
>
> >>> If I submit openMPI with:
> >>> #PBS -l nodes=2:ppn=8
> >>> mpirun MyProg
>
> Is this the real syntax you used?
>
> Or was it perhaps:
>
> #PBS -l nodes=2:ppn=8
> mpirun -n 16 MyProg
>
> You need to tell mpirun how many processes to launch,
> regardless of whether you are using Torque or not.
>
> My $0.02
>
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>
>>> Are you using the right mpirun? (There are so many out there.)
>> yeah - I use the explicit path and moved the OS X one.
>> Thanks! Jody
>>> Gus Correa
>>> ---------------------------------------------------------------------
>>> Gustavo Correa
>>> Lamont-Doherty Earth Observatory - Columbia University
>>> Palisades, NY, 10964-8000 - USA
>>> ---------------------------------------------------------------------
>>>
>>> Jody Klymak wrote:
>>>> Hi All,
>>>> I've been trying to get torque pbs to work on my OS X 10.5.7
>>>> cluster with openMPI (after finding that Xgrid was pretty flaky
>>>> about connections). I *think* this is an MPI problem (perhaps
>>>> via operator error!)
>>>> If I submit openMPI with:
>>>> #PBS -l nodes=2:ppn=8
>>>> mpirun MyProg
>>>> pbs locks off two of the processors, checked via "pbsnodes -a",
>>>> and the job output. But mpirun runs the whole job on the second
>>>> of the two processors.
>>>> If I run the same job w/o qsub (i.e. using ssh)
>>>> mpirun -n 16 -host xserve01,xserve02 MyProg
>>>> it runs fine on all the nodes....
>>>> My /var/spool/toque/server_priv/nodes file looks like:
>>>> xserve01.local np=8
>>>> xserve02.local np=8
>>>> Any idea what could be going wrong or how to debu this properly?
>>>> There is nothing suspicious in the server or mom logs.
>>>> Thanks for any help,
>>>> Jody
>>>> --
>>>> Jody Klymak
>>>> http://web.uvic.ca/~jklymak/
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> --
>> Jody Klymak
>> http://web.uvic.ca/~jklymak/
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users