Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-08-22 08:35:48


I suspect that your SGE daemons are not starting with the proper
locked memory limits (and therefore jobs started under SGE get
severely limited locked memory limits).

See these FAQ entries -- the issues described for SLURM are
applicable to all resource managers (to include SGE):

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more

On Aug 22, 2007, at 8:31 AM, Noam Meltzer wrote:

> Hi,
>
> I am running openmpi-1.2.3 compiled for 64bit on RHEL4u4.
> I also have a Voltaire InfiniBand interconnect.
> When I manually run jobs using the following command:
>
> /opt/local/openmpi-1.2.3-gcc4/bin/orterun -np 8 -hostfile ~/myHostList
> -mca btl self,openib /tcc/eandm/performance/igor/main.exe.openmpi123
>
> The job is executed just fine..
>
> Though, when run through SGE I have the weirdest problem, and get the
> following error (on all hosts in my list):
> ----------------------------------------------------------------------
> ----
> The OpenIB BTL failed to initialize while trying to create an internal
> queue. This typically indicates a failed OpenFabrics installation or
> faulty hardware. The failure occured here:
>
> Host: node4.grid.technion.ac.il
> OMPI source: btl_openib.c:828
> Function: ibv_create_cq()
> Error: Invalid argument (errno=22)
> Device: mthca0
>
> You may need to consult with your system administrator to get this
> problem fixed.
> ----------------------------------------------------------------------
> ----
>
> To send a job to the grid I use the following command:
> qrsh -cwd -q noam.q -pe orte 8 ./myScript
>
> while "myScript" looks like:
>
> #!/bin/bash
> /opt/local/openmpi-1.2.3-gcc4/bin/orterun -np $NSLOTS -mca btl
> self,openib /tcc/eandm/performance/igor/main.exe.openmpi123
>
> If I change "openib" to "tcp" (in myScript) everything works just
> fine.
>
> Any ideas?
>
> --
> Best regards,
> Noam Meltzer
> Software Support Engineer & RHCE
> E&M Computing
>
> http://www.emet.co.il
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems