Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ipath_userinit: userinit command failed: Cannot allocate memory
From: Anton Shterenlikht (mexas_at_[hidden])
Date: 2010-07-09 07:44:34


On Thu, Jul 08, 2010 at 11:04:09AM -0700, Avneesh Pant wrote:
> Anton,
> On the node that you saw the failure (u02n065)
> can you verify what the max locked memory limit
> is set to? In a bash shell you can do this with
> ulimit -l. It should be set to at least 128K.
> Also please verify that the available memory on
> the node (/proc/meminfo shows this) is sufficient
> as it may be possible that some zombie
> processes on that node are consuming memory.

Avneesh, many thanks

bigblue3> ssh u02n065
Last login: Fri Jul 9 12:24:17 2010 from bigblue3.cvos.cluster
u02n065> bash -
bash-3.2$ ulimit -l
unlimited
bash-3.2$

This seems to be an intermittent failure.
I run this test on 8 nodes once and got

bigblue3> cat z.sh.o335046
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/cvos/local/apps/torque/current/spool/aux//335046.bluequeue1.cvos.cluster
u02n077.cvos.cluster
u02n072.cvos.cluster
u02n074.cvos.cluster
u02n091.cvos.cluster
u03n061.cvos.cluster
u01n003.cvos.cluster
u01n057.cvos.cluster
u01n080.cvos.cluster
Warning: Permanently added 'u01n003,10.141.1.3' (RSA) to the list of known hosts.
Warning: Permanently added 'u01n057,10.141.1.57' (RSA) to the list of known hosts.
Warning: Permanently added 'u02n072,10.141.2.72' (RSA) to the list of known hosts.
Warning: Permanently added 'u03n061,10.141.3.61' (RSA) to the list of known hosts.
Warning: Permanently added 'u01n080,10.141.1.80' (RSA) to the list of known hosts.
Warning: Permanently added 'u02n074,10.141.2.74' (RSA) to the list of known hosts.
Warning: Permanently added 'u02n091,10.141.2.91' (RSA) to the list of known hosts.
u01n003:5.ipath_userinit: userinit command failed: Cannot allocate memory
u01n003:5.Driver initialization failure on /dev/ipath
MPIRUN.u02n077: 7 ranks have not yet exited 60 seconds after rank 5 (node u01n003) exited wit out reaching MPI_Finalize().
MPIRUN.u02n077: Waiting at most another 60 seconds for the remaining ranks to do a clean shut own before terminating 7 node processes

real 1m15.435s
user 0m0.061s
sys 0m0.151s
Warning: Permanently added 'u02n077.cvos.cluster,10.141.2.77' (RSA) to the list of known host .
Warning: Permanently added 'u02n072.cvos.cluster' (RSA) to the list of known hosts.
Warning: Permanently added 'u02n074.cvos.cluster' (RSA) to the list of known hosts.
Warning: Permanently added 'u02n091.cvos.cluster' (RSA) to the list of known hosts.
Warning: Permanently added 'u03n061.cvos.cluster' (RSA) to the list of known hosts.
Warning: Permanently added 'u01n003.cvos.cluster' (RSA) to the list of known hosts.
Warning: Permanently added 'u01n057.cvos.cluster' (RSA) to the list of known hosts.
Warning: Permanently added 'u01n080.cvos.cluster' (RSA) to the list of known hosts.
bigblue3>

I run it again a few minutes later and it worked ok:

bigblue3> cat z.sh.o335165
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/cvos/local/apps/torque/current/spool/aux//335165.bluequeue1.cvos.cluster
u02n072.cvos.cluster
u02n077.cvos.cluster
u02n091.cvos.cluster
u03n061.cvos.cluster
u01n003.cvos.cluster
u02n074.cvos.cluster
u01n057.cvos.cluster
u01n080.cvos.cluster
Warning: Permanently added 'u02n077' (RSA) to the list of known hosts.
 Number of tasks= 8 My rank= 0
 Number of tasks= 8 My rank= 7
 Number of tasks= 8 My rank= 1
 Number of tasks= 8 My rank= 3
 Number of tasks= 8 My rank= 5
 Number of tasks= 8 My rank= 6
 Number of tasks= 8 My rank= 2
 Number of tasks= 8 My rank= 4

real 0m1.590s
user 0m0.070s
sys 0m0.182s
bigblue3>

I'll ask my sysadmin about this.

As I'm just starting MPI, I was worried
I messed up something in my MPI program.
This seems ok now.

Many thanks for your help.
anton

>
> Avneesh
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Anton Shterenlikht
> Sent: Thursday, July 08, 2010 9:07 AM
> To: users_at_[hidden]
> Subject: [OMPI users] ipath_userinit: userinit command failed: Cannot allocate memory
>
> I'm trying to use MPI with fortran on Linux 2.6.18-164.6.1.el5 x86_64 I compiled this trivial code with mpif90:
>
> program simple
> include 'mpif.h'
>
> integer numtasks, rank, ierr, rc
>
> rc=1
>
> call MPI_INIT(ierr)
> if (ierr .ne. 0) then
> print *,'Error starting MPI program. Terminating.'
> call MPI_ABORT(MPI_COMM_WORLD, rc, ierr)
> end if
>
> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
> print *, 'Number of tasks=',numtasks,' My rank=',rank
>
> ! ****** do some work ******
>
> call MPI_FINALIZE(ierr)
>
> end
>
> I run it with mpirun.
>
> When I use 2 cpus or less, all is fine.
>
> When I try to specify more than 2 cpus I get this error:
>
> u02n065:0.ipath_userinit: userinit command failed: Cannot allocate memory u02n065:0.Driver initialization failure on /dev/ipath
>
> where u02n065 is the node name.
>
> Please advise
>
> many thanks
> anton
>
>
> --
> Anton Shterenlikht
> Room 2.6, Queen's Building
> Mech Eng Dept
> Bristol University
> University Walk, Bristol BS8 1TR, UK
> Tel: +44 (0)117 331 5944
> Fax: +44 (0)117 929 4423
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423