Hi Team,

I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs by CLI without any issues, but when iam submitting over torque scheduler facing the below issue.

I am facing issue while submitting the jobs through Torque scheduler. Error file is attached

Overview of the problem:

node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23)

--------------------------------------------------------------------------

PSM was unable to open an endpoint. Please make sure that the network link is

active on the node and the hardware is functioning.

 

  Error: Failure in initializing endpoint

 

I gone through the link http://www.open-mpi.org/community/lists/users/2011/12/17888.php for solution, same followed but no luck.

I exported the value in my input submit script file as export PSM_SHAREDCONTEXTS_MAX=16, and submitted the job.

Sample inputfile is

#!/bin/bash

#PBS -N matmul

#PBS -l nodes=1:ppn=1

node=1

ppn=1

nprocs=`expr ${node} \* ${ppn}`

echo "--- PBS_NODEFILE CONTENT ---"

cat $PBS_NODEFILE

export PSM_SHAREDCONTEXTS_MAX=16

 

mpirun -np ${nprocs} --hostfile $PBS_NODEFILE  /home/khan/a.out < /home/khan/iter

 

Please let me know I doing correct or not ? and suggest me for best out ?

Regards,

Bhagya Raju K