Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Openmpi-1.5.3 issue " initialization failure on /dev/ipath (err=23)"
From: Jeffrey Squyres (jsquyres_at_[hidden])
Date: 2012-03-29 11:28:44


This looks like a PSM problem (PSM is the layer than runs below Open MPI on QLogic NICs). You might need to contact QLogic tech support to find out how to solve it.

On Mar 29, 2012, at 11:26 AM, Raju wrote:

> Hi Ralph,
>
> I recompiled OMPI with --with-tm option, but still same issue... I changed the input file as below... Please let me know what i have to fine tune and verify
>
> #!/bin/bash
> #PBS -N matmul
> #PBS -l nodes=1:ppn=1
> node=1
> ppn=1
> nprocs=`expr ${node} \* ${ppn}`
> export PSM_SHAREDCONTEXTS_MAX=16
>
> mpirun -np ${nprocs} /home/khan/a.out < /home/khan/iter
>
> Regards,
> Raju...
>
> On Thu, Mar 29, 2012 at 8:49 PM, Raju <brajuk_at_[hidden]> wrote:
> Hi Ralph,
>
> Thanks for the very quick response, I did compiled with -tm option i am doing now, once it done i will revert back...
>
> Thanks
> Raju..
>
>
> On Thu, Mar 29, 2012 at 8:29 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> One thing stands out right away: why are you specifying a hostfile? Did you remember to configure OMPI with --with-tm so we launch via Torque? If not, then you could hit issues as you are actually attempting to launch via ssh, which has implications on a Torque-based system.
>
>
> On Mar 29, 2012, at 8:51 AM, Raju wrote:
>
>> Hi Team,
>>
>> I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs by CLI without any issues, but when iam submitting over torque scheduler facing the below issue.
>>
>> I am facing issue while submitting the jobs through Torque scheduler. Error file is attached
>>
>> Overview of the problem:
>>
>> node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23)
>> --------------------------------------------------------------------------
>> PSM was unable to open an endpoint. Please make sure that the network link is
>> active on the node and the hardware is functioning.
>>
>> Error: Failure in initializing endpoint
>>
>> I gone through the link http://www.open-mpi.org/community/lists/users/2011/12/17888.php for solution, same followed but no luck.
>>
>> I exported the value in my input submit script file as export PSM_SHAREDCONTEXTS_MAX=16, and submitted the job.
>>
>> Sample inputfile is
>>
>> #!/bin/bash
>> #PBS -N matmul
>> #PBS -l nodes=1:ppn=1
>> node=1
>> ppn=1
>> nprocs=`expr ${node} \* ${ppn}`
>> echo "--- PBS_NODEFILE CONTENT ---"
>> cat $PBS_NODEFILE
>> export PSM_SHAREDCONTEXTS_MAX=16
>>
>> mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < /home/khan/iter
>>
>> Please let me know I doing correct or not ? and suggest me for best out ?
>>
>> Regards,
>> Bhagya Raju K
>> <errfile.txt>_______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/