Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Aleph One (perfektionist_at_[hidden])
Date: 2006-10-13 23:43:57


errno 24 means "Too many open files".
Looks like you may be hitting the upper limit for the number of open file
descriptors.
Check /proc/sys/fs/file-max and see if you need to bump it up.
Not sure if you need to bump up "ulimit -n", but worth a try.

-Aleph

On 10/14/06, Adam Moody <moody20_at_[hidden]> wrote:
>
> Hello,
> I'm trying to run a 500 node job using mpirun / slurm with OpenMPI-1.1.1
> and see the following errors at startup:
>
> [rhea342:09444] [0,1,318]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
> failed with errno=104
> [rhea32:13463] mca_oob_tcp_accept: accept() failed with errno 24.
> [rhea32:13463] mca_oob_tcp_accept: accept() failed with errno 24.
> [rhea326:09641] [0,1,302]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
> failed with errno=104
> ...
>
> I'm starting the job with the following commands:
>
> srun -N 500 -A
> mpirun -np 500 -bynode hello_mpi
>
> Smaller jobs around 50 nodes run just fine. Any ideas?
> Thanks,
> -Adam Moody
> LLNL
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>