Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tim S. Woodall (twoodall_at_[hidden])
Date: 2005-12-06 14:15:37


Daryl,

Try this:

-------- Original Message --------
Subject: RE: only root running mpi jobs with 1.0.1rc5
Date: Thu, 01 Dec 2005 18:49:46 -0700
From: Joshua Aune <luken_at_[hidden]>
Reply-To: luken_at_[hidden]
Organization: Linux Networx
To: Todd Wilde <Todd_at_[hidden]>
CC: Matthew Finlay <Matt_at_[hidden]>, twoodall_at_[hidden], Robert Cummins <rcummins_at_[hidden]>, Pat Lindsay <plindsay_at_[hidden]>
References: <25AE7F432672D511B8DC00B0D0DF11DA05FC26CB_at_MTIEX01>

Sounds like you were right

* soft memlock 8388608 # 8 GB
* hard memlock 8388608 # 8 GB

and now I get no errors :) Looks like the limits were propigated to the
back end nodes.

Tim, this should fix your problem as well?

On Thu, 2005-12-01 at 17:26 -0800, Todd Wilde wrote:
> How about this one:
>
> For Redhat AS4.0 and Fedora Core 3 or a newer kernel, edit the
> file /etc/security/limits.conf and add the following two lines:
>
> soft memlock <number>
>
> hard memlock <number>
>
> The <number> value denotes the number of kilobytes that may be locked
> by a process.
>
> > -----Original Message-----
> > From: Joshua Aune [mailto:luken_at_[hidden]]
> > Sent: Thursday, December 01, 2005 3:50 PM
> > To: Todd Wilde
> > Cc: Matthew Finlay; twoodall_at_[hidden]; Robert Cummins; Pat Lindsay
> > Subject: RE: only root running mpi jobs with 1.0.1rc5
> >
> > On Thu, 2005-12-01 at 15:39 -0800, Todd Wilde wrote:
> > > It may be a permissions issue with normal users locking memory.
> I've
> > > seen this in the past. Try adding the following command at boot:
> > >
> > >
> > > sysctl -w vm.disable_cap_mlock=1
> >
> > This doesn't exist in 2.6.14...
> >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Joshua Aune [mailto:luken_at_[hidden]]
> > > > Sent: Thursday, December 01, 2005 1:56 PM
> > > > To: Matthew Finlay; Todd Wilde; twoodall_at_[hidden]
> > > > Cc: Robert Cummins; Pat Lindsay
> > > > Subject: only root running mpi jobs with 1.0.1rc5
> > > >
> > > > Root runs jobs fine but users don't.
> > > >
> > > > Any thoughts?
> > > >
> > > > Thanks,
> > > > josh
> > > >
> > > > coyote2-compute# module purge
> > > > coyote2-compute# module load compiler/gcc mpi/openmpi-1.0.1rc5
> > > > coyote2-compute# cd /home/luken/hello
> > > > coyote2-compute# mpirun -np 2 -H 201,202 mpi_hello
> > > > n201: I am rank 0
> > > > n202: I am rank 1
> > > >
> > > >
> > > > coyote2-compute$ su - luken
> > > > coyote2-compute$ module purge
> > > > coyote2-compute$ module load compiler/gcc mpi/openmpi-1.0.1rc5
> > > > coyote2-compute$ cd /home/luken/hello
> > > > coyote2-compute$ mpirun -np 2 -H 201,202 mpi_hello
> > > > [0,1,0][btl_openib.c:803:mca_btl_openib_module_init] error
> creating
> > > high
> > > > priority cq for mthca0 errno says Cannot allocate memory
> > > > [0,1,1][btl_openib.c:803:mca_btl_openib_module_init] error
> creating
> > > high
> > > > priority cq for mthca0 errno says Cannot allocate memory
> > > >
> > > > n201: I am rank 0
> > > >
> > > > n202: I am rank 1
> > >
>

Daryl W. Grunau wrote:
> Hi, I'm running OMPI 1.1a1r8378 on 2.6.14 + recent OpenIB stack and getting
> the following runtime error:
>
> [0,1,0][btl_openib.c:803:mca_btl_openib_module_init] error creating high priority cq for mthca0 errno says Cannot allocate memory
> [0,1,3][btl_openib.c:803:mca_btl_openib_module_init] error creating high priority cq for mthca0 errno says Cannot allocate memory
> [0,1,1][btl_openib.c:803:mca_btl_openib_module_init] error creating high priority cq for mthca0 errno says Cannot allocate memory
> [0,1,2][btl_openib.c:803:mca_btl_openib_module_init] error creating high priority cq for mthca0 errno says Cannot allocate memory
>
>
> Strange thing is that it works properly when I run as root. A permissions
> problem on my part? My devices look like:
>
> # ls -l /dev/infiniband/*
> crw------- 1 root root 231, 64 Dec 5 17:16 /dev/infiniband/issm0
> crw------- 1 root root 231, 65 Dec 5 17:16 /dev/infiniband/issm1
> crw------- 1 root root 231, 0 Dec 5 17:16 /dev/infiniband/umad0
> crw------- 1 root root 231, 1 Dec 5 17:16 /dev/infiniband/umad1
> crw-rw-rw- 1 root root 231, 192 Dec 5 17:16 /dev/infiniband/uverbs0
>
> Daryl
>