Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Adam Moody (moody20_at_[hidden])
Date: 2006-10-13 17:09:43


Hello,
I'm trying to run a 500 node job using mpirun / slurm with OpenMPI-1.1.1
and see the following errors at startup:

[rhea342:09444] [0,1,318]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
failed with errno=104
[rhea32:13463] mca_oob_tcp_accept: accept() failed with errno 24.
[rhea32:13463] mca_oob_tcp_accept: accept() failed with errno 24.
[rhea326:09641] [0,1,302]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
failed with errno=104
...

I'm starting the job with the following commands:

srun -N 500 -A
mpirun -np 500 -bynode hello_mpi

Smaller jobs around 50 nodes run just fine. Any ideas?
Thanks,
-Adam Moody
LLNL