Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified
From: Brock Palen (brockp_at_[hidden])
Date: 2013-01-24 10:20:22


On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote:

> or do i just need to compile two versions, one with IB and one without?

You should not need to, we have OMPI compiled for openib/psm and run that same install on psm/tcp and verbs(openib) based gear.

All the nodes assigned to your job have qlogic IB adaptors? They also have libpsm_ininipath installed on all of them? This will be required.

Also did you build your openmpi with tm? --with-tm=/usr/local/torque/ (or where ever the path to lib/libtorque.so is.)

With TM support, mpirun from OMPI will know how to find the CPUs assigned to your job by torque. This is the better way, you can also in a pinch use
mpirun -machinefile $PBS_NODEFILE -np 8 ....

But really tm is better.

Here is our build line for OMPI:

./configure --prefix=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1 --mandir=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1/man --with-tm=/usr/local/torque --with-openib --with-psm --with-mxm=/home/software/rhel6/mxm/1.5 --with-io-romio-flags=--with-file-system=testfs+ufs+lustre --disable-dlopen --enable-shared CC=icc CXX=icpc FC=ifort F77=ifort

We run torque with OMPI.

>
> On Thu, Jan 24, 2013 at 9:09 AM, Sabuj Pattanayek <sabujp_at_[hidden]> wrote:
>> ahha, with --display-allocation I'm getting :
>>
>> mca: base: component_find: unable to open
>> /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm:
>> libpsm_infinipath.so.1: cannot open shared object file: No such file
>> or directory (ignored)
>>
>> I think the system I compiled it on has different ib libs than the
>> nodes. I'll need to recompile and then see if it runs, but is there
>> anyway to get it to ignore IB and just use gigE? Not all of our nodes
>> have IB and I just want to use any node.
>>
>> On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> How did you configure OMPI? If you add --display-allocation to your cmd line, does it show all the nodes?
>>>
>>> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek <sabujp_at_[hidden]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm submitting a job through torque/PBS, the head node also runs the
>>>> Moab scheduler, the .pbs file has this in the resources line :
>>>>
>>>> #PBS -l nodes=2:ppn=4
>>>>
>>>> I've also tried something like :
>>>>
>>>> #PBS -l procs=56
>>>>
>>>> and at the end of script I'm running :
>>>>
>>>> mpirun -np 8 cat /dev/urandom > /dev/null
>>>>
>>>> or
>>>>
>>>> mpirun -np 56 cat /dev/urandom > /dev/null
>>>>
>>>> ...depending on how many processors I requested. The job starts,
>>>> $PBS_NODEFILE has the nodes that the job was assigned listed, but all
>>>> the cat's are piled onto the first node. Any idea how I can get this
>>>> to submit jobs across multiple nodes? Note, I have OSU mpiexec working
>>>> without problems with mvapich and mpich2 on our cluster to launch jobs
>>>> across multiple nodes.
>>>>
>>>> Thanks,
>>>> Sabuj
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users