Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Qlogic & openmpi
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-11-29 10:44:07


On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote:

> I do have a contract and i tried to open a case, but their support is ......

What happens if you put a delay between the two jobs? E.g., if you just delay a few seconds before the 2nd job starts? Perhaps the ipath device just needs a little time before it will be available...? (that's a total guess)

I suggest this because the PSM device will definitely give you better overall performance than the QLogic verbs support. Their verbs support basically barely works -- PSM is their primary device and the one that we always recommend.

> Anyway. I'm stii working on the strange error message from mpirun saying it can't allocate memory when at the same time it also reports that the memory is unlimited ...
>
>
> Arnaud
>
> On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> I'm afraid we don't have any contacts left at QLogic to ask them any more... do you have a support contract, perchance?
>
> On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote:
>
> > Hello,
> >
> > I run into a stange problem with qlogic OFED and openmpi. When i submit (through SGE) 2 jobs on the same node, the second job ends up with:
> >
> > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26)
> >
> > I'm pretty sure the infiniband is working well as the other job runs fine.
> >
> > Here is details about the configuration:
> >
> > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a switch)
> > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll)
> > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge)
> >
> > -------------
> >
> > In order to fix this problem i recompiled openmpi without psm support, but i faced an other problem:
> >
> > The OpenFabrics (openib) BTL failed to initialize while trying to
> > allocate some locked memory. This typically can indicate that the
> > memlock limits are set too low. For most HPC installations, the
> > memlock limits should be set to "unlimited". The failure occured
> > here:
> >
> > Local host: compute-0-6.local
> > OMPI source: btl_openib.c:329
> > Function: ibv_create_srq()
> > Device: qib0
> > Memlock limit: unlimited
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/