Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] qsub and limits.conf
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-05-17 13:16:48


If you are using the native Torque capabilities to launch Open MPI
jobs, note that limits.conf is not necessarily obeyed. I'm not a
Torque expert, but you should probably check out:

     http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more

And check the Torque docs about how it propagates and enforces such
limits.

On May 17, 2008, at 10:58 AM, Javier Lazaro wrote:

> I have install torque-2.3.0 and openmpi-1.2.3.
> I make tests and I have discovered that the jobs launched with the
> parameter '-hostfile' or '-machinefile' stops are to exceed the
> limits in the file /etc/security/limits.conf
> More details:
>
> file hola.c
>
> #include <stdio.h>
> #include <unistd.h>
> #include "mpi.h"
> int main(int argc, char *argv[]){
> int rank;
> int size;
> int i;
> int namelen;
> char pn[MPI_MAX_PROCESSOR_NAME];
>
> MPI_Init(&argc,&argv);
> MPI_Comm_size(MPI_COMM_WORLD,
> &size);
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> MPI_Get_processor_name(pn,&namelen);
>
> sleep(rank);
>
> system("bash -c 'ulimit -a'");
>
> for (i=0;;i++) {
> if (i%100000000==0) {
> printf("--> %i --> Hola desde %d, de un
> total de: %d. estoy en %s\n",i, rank, size,pn);
> }
> }
> MPI_Finalize();
>
> return 0;
>
> }
>
> ##
>
> > mpicc hola.c
>
> file mpi3.sh
>
> #!/bin/sh
>
> #PBS -l nodes=3:ppn=1
> #PBS -N pruebaMPI3
> #PBS -o 3outpruebaMPIout3
> #PBS -e 3errpruebaMPIerr3
>
> cat ${PBS_NODEFILE}
>
> mpirun -hostfile ${PBS_NODEFILE} /home/javier/mpi_hola/a.out
>
> ##
>
> launch job with torque
> > qsub mpi3.sh
>
> ##
>
> termined
>
> file 3outpruebaMPIout3
> maquina3b
> maquina2b
> maquina1b
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> file size (blocks, -f) unlimited
> pending signals (-i) 8185
> max locked memory (kbytes, -l) 32
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> stack size (kbytes, -s) 8192
> cpu time (seconds, -t) unlimited #limit
> maquina3b
> max user processes (-u) 8185
> virtual memory (kbytes, -v) 2511840
> file locks (-x) unlimited
> --> 0 --> Hola desde 0, de un total de: 3. estoy en maquina3b
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> file size (blocks, -f) unlimited
> pending signals (-i) 8185
> max locked memory (kbytes, -l) 32
> max memory size (kbytes, -m) 880005
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> stack size (kbytes, -s) 8192
> cpu time (seconds, -t) 60 #limit maquina2b
> max user processes (-u) 8185
> virtual memory (kbytes, -v) 2511840
> file locks (-x) unlimited
> --> 0 --> Hola desde 1, de un total de: 3. estoy en maquina2b
> --> 100000000 --> Hola desde 0, de un total de: 3. estoy en maquina3b
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> file size (blocks, -f) unlimited
> pending signals (-i) 8185
> max locked memory (kbytes, -l) 32
> max memory size (kbytes, -m) 880005
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> stack size (kbytes, -s) 8192
> cpu time (seconds, -t) 60 #limit maquina1b
> max user processes (-u) 8185
> virtual memory (kbytes, -v) 2511840
> file locks (-x) unlimited
> --> 0 --> Hola desde 2, de un total de: 3. estoy en maquina1b
> --> 100000000 --> Hola desde 1, de un total de: 3. estoy en maquina2b
> --> 200000000 --> Hola desde 0, de un total de: 3. estoy en maquina3b
> --> 100000000 --> Hola desde 2, de un total de: 3. estoy en maquina1b
> --> 200000000 --> Hola desde 1, de un total de: 3. estoy en maquina2b
> ........
> --> -500000000 --> Hola desde 1, de un total de: 3. estoy en maquina2b
> 1 additional process aborted (not shown)
> 1 process killed (possibly by Open MPI)
>
> ##
>
> file 3errpruebaMPIerr3
>
> mpirun noticed that job rank 0 with PID 10839 on node maquina3b
> exited on signal 15 (Terminated).
>
> ---------------------------
> I have limited time of cpu at 60 seconds in all nodes. Torque modify
> this limit only for maquina3b.
> I think that torque should modify cpu's limit in the resf of nodes.
> where is the error?
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems