Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] torque pbs behaviour...
From: Gus Correa (gus_at_[hidden])
Date: 2009-08-10 18:37:16


Thank you for the correction, Ralph.
I didn't know there was a (wise) default for the
number of processes when using Torque-enabled OpenMPI.

Gus Correa

Ralph Castain wrote:
> Just to correct something said here.
>
>> You need to tell mpirun how many processes to launch,
>> regardless of whether you are using Torque or not.
>
> This is not correct. If you don't tell mpirun how many processes to
> launch, we will automatically launch one process for every slot in your
> allocation. In the case described here, there were 16 slots allocated,
> so we would automatically launch 16 processes.
>
> Ralph
>
>
>
> On Aug 10, 2009, at 3:47 PM, Gus Correa wrote:
>
>> Hi Jody, list
>>
>> See comments inline.
>>
>> Jody Klymak wrote:
>>> On Aug 10, 2009, at 13:01 PM, Gus Correa wrote:
>>>> Hi Jody
>>>>
>>>> We don't have Mac OS-X, but Linux, not sure if this applies to you.
>>>>
>>>> Did you configure your OpenMPI with Torque support,
>>>> and pointed to the same library that provides the
>>>> Torque you are using (--with-tm=/path/to/torque-library-directory)?
>>> Not explicitly. I'll check into that....
>>
>>
>> 1) If you don't do it explicitly, configure will use the first libtorque
>> it finds (and that works I presume),
>> which may/may not be the one you want, if you have more than one.
>> If you only have one version of Torque installed,
>> this shouldn't be the problem.
>>
>> 2) Have you tried something very simple, like the examples/hello_c.c
>> program, to test the Torque-OpenMPI integration?
>>
>> 3) Also, just in case, put a "cat $PBS_NODEFILE" inside your script,
>> before mpirun, to see what it reports.
>> For "#PBS -l nodes=2:ppn=8"
>> it should show 16 lines, 8 with the name of each node.
>>
>> 4) Finally, just to make sure the syntax is right.
>> On your message you wrote:
>>
>> >>> If I submit openMPI with:
>> >>> #PBS -l nodes=2:ppn=8
>> >>> mpirun MyProg
>>
>> Is this the real syntax you used?
>>
>> Or was it perhaps:
>>
>> #PBS -l nodes=2:ppn=8
>> mpirun -n 16 MyProg
>>
>> You need to tell mpirun how many processes to launch,
>> regardless of whether you are using Torque or not.
>>
>> My $0.02
>>
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>>
>>>> Are you using the right mpirun? (There are so many out there.)
>>> yeah - I use the explicit path and moved the OS X one.
>>> Thanks! Jody
>>>> Gus Correa
>>>> ---------------------------------------------------------------------
>>>> Gustavo Correa
>>>> Lamont-Doherty Earth Observatory - Columbia University
>>>> Palisades, NY, 10964-8000 - USA
>>>> ---------------------------------------------------------------------
>>>>
>>>> Jody Klymak wrote:
>>>>> Hi All,
>>>>> I've been trying to get torque pbs to work on my OS X 10.5.7
>>>>> cluster with openMPI (after finding that Xgrid was pretty flaky
>>>>> about connections). I *think* this is an MPI problem (perhaps via
>>>>> operator error!)
>>>>> If I submit openMPI with:
>>>>> #PBS -l nodes=2:ppn=8
>>>>> mpirun MyProg
>>>>> pbs locks off two of the processors, checked via "pbsnodes -a", and
>>>>> the job output. But mpirun runs the whole job on the second of the
>>>>> two processors.
>>>>> If I run the same job w/o qsub (i.e. using ssh)
>>>>> mpirun -n 16 -host xserve01,xserve02 MyProg
>>>>> it runs fine on all the nodes....
>>>>> My /var/spool/toque/server_priv/nodes file looks like:
>>>>> xserve01.local np=8
>>>>> xserve02.local np=8
>>>>> Any idea what could be going wrong or how to debu this properly?
>>>>> There is nothing suspicious in the server or mom logs.
>>>>> Thanks for any help,
>>>>> Jody
>>>>> --
>>>>> Jody Klymak
>>>>> http://web.uvic.ca/~jklymak/
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> --
>>> Jody Klymak
>>> http://web.uvic.ca/~jklymak/
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users