Hi Team,
I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs
by CLI without any issues, but when iam submitting over torque scheduler
facing the below issue.
I am facing issue while submitting the jobs through Torque scheduler. Error
file is attached
*Overview of the problem:*
node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23)
--------------------------------------------------------------------------
PSM was unable to open an endpoint. Please make sure that the network link
is
active on the node and the hardware is functioning.
Error: Failure in initializing endpoint
I gone through the link
http://www.open-mpi.org/community/lists/users/2011/12/17888.php for
solution, same followed but no luck.
I exported the value in my input submit script file as export
PSM_SHAREDCONTEXTS_MAX=16, and submitted the job.
Sample inputfile is
#!/bin/bash
#PBS -N matmul
#PBS -l nodes=1:ppn=1
node=1
ppn=1
nprocs=`expr ${node} \* ${ppn}`
echo "--- PBS_NODEFILE CONTENT ---"
cat $PBS_NODEFILE
export PSM_SHAREDCONTEXTS_MAX=16
mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out <
/home/khan/iter
Please let me know I doing correct or not ? and suggest me for best out ?
Regards,
Bhagya Raju K
|