Hi Jeffrey,
This looks like a PSM problem (PSM is the layer than runs below Open MPI on QLogic NICs). You might need to contact QLogic tech support to find out how to solve it.
--
On Mar 29, 2012, at 11:26 AM, Raju wrote:
> Hi Ralph,
>
> I recompiled OMPI with --with-tm option, but still same issue... I changed the input file as below... Please let me know what i have to fine tune and verify
>
> #!/bin/bash
> #PBS -N matmul
> #PBS -l nodes=1:ppn=1
> node=1
> ppn=1
> nprocs=`expr ${node} \* ${ppn}`
> export PSM_SHAREDCONTEXTS_MAX=16
>
> mpirun -np ${nprocs} /home/khan/a.out < /home/khan/iter
>
> Regards,
> Raju...
>
> On Thu, Mar 29, 2012 at 8:49 PM, Raju <brajuk@gmail.com> wrote:
> Hi Ralph,
>
> Thanks for the very quick response, I did compiled with -tm option i am doing now, once it done i will revert back...
>
> Thanks
> Raju..
>
>
> On Thu, Mar 29, 2012 at 8:29 PM, Ralph Castain <rhc@open-mpi.org> wrote:
> One thing stands out right away: why are you specifying a hostfile? Did you remember to configure OMPI with --with-tm so we launch via Torque? If not, then you could hit issues as you are actually attempting to launch via ssh, which has implications on a Torque-based system.
>
>
> On Mar 29, 2012, at 8:51 AM, Raju wrote:
>
>> Hi Team,
>>
>> I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs by CLI without any issues, but when iam submitting over torque scheduler facing the below issue.
>>
>> I am facing issue while submitting the jobs through Torque scheduler. Error file is attached
>>
>> Overview of the problem:
>>
>> node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23)
>> --------------------------------------------------------------------------
>> PSM was unable to open an endpoint. Please make sure that the network link is
>> active on the node and the hardware is functioning.
>>
>> Error: Failure in initializing endpoint
>>
>> I gone through the link http://www.open-mpi.org/community/lists/users/2011/12/17888.php for solution, same followed but no luck.
>>
>> I exported the value in my input submit script file as export PSM_SHAREDCONTEXTS_MAX=16, and submitted the job.
>>
>> Sample inputfile is
>>
>> #!/bin/bash
>> #PBS -N matmul
>> #PBS -l nodes=1:ppn=1
>> node=1
>> ppn=1
>> nprocs=`expr ${node} \* ${ppn}`
>> echo "--- PBS_NODEFILE CONTENT ---"
>> cat $PBS_NODEFILE
>> export PSM_SHAREDCONTEXTS_MAX=16
>>
>> mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < /home/khan/iter
>>
>> Please let me know I doing correct or not ? and suggest me for best out ?
>>
>> Regards,
>> Bhagya Raju K
>> <errfile.txt>_______________________________________________
>> devel mailing list
>> devel@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel