Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] an MPI process using about 12 file descriptors per neighbour processes - isn't it a bit too much?
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2009-08-14 08:52:24


Hi OpenMPI folks,

We use Sun MPI (Cluster Tools 8.2) and also native OpenMPI 1.3.3 and we
wonder us about the way OpenMPI devours file descriptors: on our
computers, ulimit -n is currently set to 1024, and we found out that we
may run maximally 84 MPI processes per box, and if we try to run 85 (or
above) processes, we got such error message:

--------------------------------------------------------------------------
Error: system limit exceeded on number of network connections that can
be open
.....
--------------------------------------------------------------------------

Simple computing tells us, 1024/85 is about 12. This lets us believe
that there is an single OpenMPI process, which needs 12 file descriptor
per other MPI process.

By now, we have only one box with more than 100 CPUs on which it may be
meaningfull to run more than 85 processes. But in the quite near future,
many-core boxes are arising (we also ordered 128-way nehalems), so it
may be disadvantageous to consume a lot of file descriptors per MPI
process.

We see a possibility to awod this problem by setting the ulimit for file
descriptor to a higher value. This is not easy unter linux: you need
either to recompile the kernel (which is not a choise for us), or to set
a root process somewhere which will set the ulimit to a higher value
(which is a security risk and not easy to implement).

We also tryed to set the opal_set_max_sys_limits to 1, as the help says
(by adding "-mca opal_set_max_sys_limits 1" to the command line), but
we does not see any change of behaviour).

What is your meaning?

Best regards,
Paul Kapinos
RZ RWTH Aachen

#####################################################
  /opt/SUNWhpc/HPC8.2/intel/bin/mpiexec -mca opal_set_max_sys_limits 1
-np 86 a.out