Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] an MPI process using about 12 file descriptors per neighbour processes - isn't it a bit too much?
From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2009-08-14 12:08:10


Hi Paul:
I tried the running the same way as you did and I saw the same thing. I
was using ClusterTools 8.2 (Open MPI 1.3.3r21324) and running on
Solaris. I looked at the mpirun process and it was definitely consuming
approximately 12 file descriptors per a.out process.

  burl-ct-v440-0 59 =>limit descriptors
descriptors 1024
  burl-ct-v440-0 60 =>mpirun -np 84 a.out
Connectivity test on 84 processes PASSED.
  burl-ct-v440-0 61 =>mpirun -np 85 a.out
[burl-ct-v440-0:27083] [[38835,0],0] ORTE_ERROR_LOG: The system limit on
number of network connections a process can open was reached in file
oob_tcp.c at line 446
--------------------------------------------------------------------------
Error: system limit exceeded on number of network connections that can
be open

This can be resolved by setting the mca parameter
opal_set_max_sys_limits to 1,
increasing your limit descriptor setting (using limit or ulimit commands),
or asking the system administrator to increase the system limit.
--------------------------------------------------------------------------
  burl-ct-v440-0 62 =>

This should not be happening. I will try and look to see what is going
on. The process that is complaining is the mpirun process which in this
scenario forks/execs all the a.outs.

Rolf

On 08/14/09 08:52, Paul Kapinos wrote:
> Hi OpenMPI folks,
>
> We use Sun MPI (Cluster Tools 8.2) and also native OpenMPI 1.3.3 and we
> wonder us about the way OpenMPI devours file descriptors: on our
> computers, ulimit -n is currently set to 1024, and we found out that we
> may run maximally 84 MPI processes per box, and if we try to run 85 (or
> above) processes, we got such error message:
>
> --------------------------------------------------------------------------
> Error: system limit exceeded on number of network connections that can
> be open
> .....
> --------------------------------------------------------------------------
>
> Simple computing tells us, 1024/85 is about 12. This lets us believe
> that there is an single OpenMPI process, which needs 12 file descriptor
> per other MPI process.
>
> By now, we have only one box with more than 100 CPUs on which it may be
> meaningfull to run more than 85 processes. But in the quite near future,
> many-core boxes are arising (we also ordered 128-way nehalems), so it
> may be disadvantageous to consume a lot of file descriptors per MPI
> process.
>
>
> We see a possibility to awod this problem by setting the ulimit for file
> descriptor to a higher value. This is not easy unter linux: you need
> either to recompile the kernel (which is not a choise for us), or to set
> a root process somewhere which will set the ulimit to a higher value
> (which is a security risk and not easy to implement).
>
> We also tryed to set the opal_set_max_sys_limits to 1, as the help says
> (by adding "-mca opal_set_max_sys_limits 1" to the command line), but
> we does not see any change of behaviour).
>
> What is your meaning?
>
> Best regards,
> Paul Kapinos
> RZ RWTH Aachen
>
>
>
> #####################################################
> /opt/SUNWhpc/HPC8.2/intel/bin/mpiexec -mca opal_set_max_sys_limits 1
> -np 86 a.out
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
=========================
rolf.vandevaart_at_[hidden]
781-442-3043
=========================