Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] torque pbs behaviour...
From: Jody Klymak (jklymak_at_[hidden])
Date: 2009-08-10 17:14:11

On Aug 10, 2009, at 13:01 PM, Gus Correa wrote:

> Hi Jody
> We don't have Mac OS-X, but Linux, not sure if this applies to you.
> Did you configure your OpenMPI with Torque support,
> and pointed to the same library that provides the
> Torque you are using (--with-tm=/path/to/torque-library-directory)?

Not explicitly. I'll check into that....

> Are you using the right mpirun? (There are so many out there.)

yeah - I use the explicit path and moved the OS X one.

Thanks! Jody

> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
> Jody Klymak wrote:
>> Hi All,
>> I've been trying to get torque pbs to work on my OS X 10.5.7
>> cluster with openMPI (after finding that Xgrid was pretty flaky
>> about connections). I *think* this is an MPI problem (perhaps via
>> operator error!)
>> If I submit openMPI with:
>> #PBS -l nodes=2:ppn=8
>> mpirun MyProg
>> pbs locks off two of the processors, checked via "pbsnodes -a", and
>> the job output. But mpirun runs the whole job on the second of the
>> two processors.
>> If I run the same job w/o qsub (i.e. using ssh)
>> mpirun -n 16 -host xserve01,xserve02 MyProg
>> it runs fine on all the nodes....
>> My /var/spool/toque/server_priv/nodes file looks like:
>> xserve01.local np=8
>> xserve02.local np=8
>> Any idea what could be going wrong or how to debu this properly?
>> There is nothing suspicious in the server or mom logs.
>> Thanks for any help,
>> Jody
>> --
>> Jody Klymak
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jody Klymak