Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Max number of processes per host for an OMPI run?
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-09-11 03:46:00


As Ralph said, you're probably running out of file descriptors; mpirun uses a few (2-3? I don't remember offhand) for each MPI process launched.

There are many factors that can cause limits like this -- file descriptors are only one. It very much depends on the configuration of the machine on which you're running. My point: Sorry, but it'll likely take some experimentation on your part to figure out how many you can run on a single machine.

On Sep 10, 2013, at 4:10 PM, Francesco Simula <francesco.simula_at_[hidden]> wrote:

> Dear forum,
>
> I probably must apologize in advance for the very basic question but I wasn't able to find an answer elsewhere:
> how do I find the maximum number of processes that can be concurrently instantiated by mpirun on one single host of a cluster?
>
> If I launch (on an CentOS 6.3 cluster with quad-core dual Xeons nodes, equipped with OpenMPI 1.5.4 and IB HCAs but I think this latter is of no consequence):
>
> [cut]
> mpirun -np 250 -host q012 hostname
> [/cut]
>
> I expect and obtain 250 rows of:
> [cut]
> q012.qng
> [/cut]
>
> The same for 251, 252, 253 and 254 BUT not for 255, when it returns:
>
> [cut]
> --------------------------------------------------------------------------
> mpirun was unable to start the specified application as it encountered an error
> on node q012. More information may be available above.
> --------------------------------------------------------------------------
> [/cut]
>
> I know that 250 processes is quite an oversubscription for a single node that has no more than 8 real cores but I wanted to see the actual degradation of performances instead of a crash.
>
> Which hard limit (in OpenMPI or in the system) am I hitting for not being able to run 255 MPI processes on one single host?
>
> The output of ulimit -a for the user is:
>
> [cut]
> ulimit -a
> core file size (blocks, -c) 1000000
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 95054
> max locked memory (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 100000
> cpu time (seconds, -t) unlimited
> max user processes (-u) 1024
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
> [/cut]
>
> Many thanks,
> Francesco
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/