Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] error no=110 (Connection timeout)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-05-06 16:59:01


Sorry for the delay in replying; this mail slipped by me in my inbox.

On Apr 26, 2009, at 11:50 PM, Rangesh Gupta wrote:

> Hi all,
>
> I m facing problem while running Openfoam1.5 the executable is
> sonicTurbFoam with the help of openmpi it hang after some time,
> every time it hang at different place. The Mpi command is
>
> mpirun --mca btl_openib_if_include ib0 -mca btl_tcp_if_exclude
> lo,eth0,eth1 --mca btl_openib_ib_timeout 40 -n $NO_OF_PROCESS -
> machinefile $MYHOSTS $1 -parallel

FWIW, if you're submitting via slurm, the -machinefile and -n options
shouldn't be necessary -- it should get those directly from SLURM.

> We are using 64 processor on 8 nodes.
>
> I m submitting it with the help of lsf scheduler and internally it
> usage SLURM as a resource manager.
>
> Error :
> [n112][0,1,41][btl_tcp_frag.c:
> 202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with
> errno=110
> [n112][0,1,43][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed with errno=110

errno=110 is timeout on Linux. Do you happen to have firewalling
enabled on your compute nodes? OMPI needs to be able to use random
TCP ports to connect between all of the processes in an MPI job.

-- 
Jeff Squyres
Cisco Systems