Unfortunately, that is indeed true for the 1.2 series. It is significantly
better in 1.3, but still not perfect.
Basically, in 1.2, every process has to "call home" to mpirun (or mpiexec -
same thing) when it starts up. This is required in order to exchange
connection info so that the MPI comm subsystem can wire itself up. Thus,
every process creates a TCP connection to the node where mpirun sits.
In 1.3, this is reduced to one connection per node as all non-MPI comm is
routed through the local daemon on each node. Still not ideal, but a
In 1.4, we will further reduce this for systems with static IP addresses to
perhaps as little as a single connection to mpirun. But that remains to be
On 7/10/08 6:04 AM, "Samuel Sarholz" <sarholz_at_[hidden]> wrote:
> mpiexec seems to need a file handle per started process.
> By default the number of file handles is set to 1024 here, thus I can
> start about 900 something processes.
> With higher numbers I get
> mca_oob_tcp_accept: accept() failed: Too many open files (24).
> If I decrease the file handles on the shell I run mpiexec from, I get
> the error with less processes. However no MPI process is started on the
> local machine.
> The first thing I am wondering about is the TCP because Infiniband is
> used for communication.
> And secondly what are the files/connections used for?
> Do I really have to set the file handles to 5000 (and to 32000 in a few
> years) for large MPI programs or is there a workaround?
> Another thing that I don't get is that the problem only arises if I
> start an MPI program.
> mpiexec -np 2000 hostname
> works fine.
> best regards,
> users mailing list