Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] error no=110 (Connection timeout)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-05-06 16:59:01


Sorry for the delay in replying; this mail slipped by me in my inbox.

On Apr 26, 2009, at 11:50 PM, Rangesh Gupta wrote:

> Hi all,
>
> I m facing problem while running Openfoam1.5 the executable is
> sonicTurbFoam with the help of openmpi it hang after some time,
> every time it hang at different place. The Mpi command is
>
> mpirun --mca btl_openib_if_include ib0 -mca btl_tcp_if_exclude
> lo,eth0,eth1 --mca btl_openib_ib_timeout 40 -n $NO_OF_PROCESS -
> machinefile $MYHOSTS $1 -parallel

FWIW, if you're submitting via slurm, the -machinefile and -n options
shouldn't be necessary -- it should get those directly from SLURM.

> We are using 64 processor on 8 nodes.
>
> I m submitting it with the help of lsf scheduler and internally it
> usage SLURM as a resource manager.
>
> Error :
> [n112][0,1,41][btl_tcp_frag.c:
> 202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with
> errno=110
> [n112][0,1,43][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed with errno=110

errno=110 is timeout on Linux. Do you happen to have firewalling
enabled on your compute nodes? OMPI needs to be able to use random
TCP ports to connect between all of the processes in an MPI job.

-- 
Jeff Squyres
Cisco Systems