Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
From: V. Ram (v_r_959_at_[hidden])
Date: 2008-10-10 12:42:19


Leonardo,

These nodes are all using intel e1000 chips. As the nodes are AMD
K7-based, these are the older chips, not the new ones with all the
eeprom issues with the newer kernel.

The kernel in use is from the 2.6.22 family, and the e1000 driver is the
one shipped with the kernel. I am running it compiled into the kernel,
not as a module.

When testing using the intel MPI Benchmarks, I found that increasing the
receive ring buffer size to the max (4096) helped performance, so I use
ethtool -G on startup.

Checking ethtool -k, I see that tcp segment offload is on. I can try
turning that off to see what happens.

Oddly, on 64-bit nodes using the tg3 driver, this code doesn't crash or
have these same issues, and I'm not having to turn off tso.

Can anyone else suggest why the code might be crashing when running over
ethernet and not over shared memory? Any suggestions on how to debug
this or interpret the error message issued from btl_tcp_frag.c ?

Thanks.

On Wed, 01 Oct 2008 18:11:34 +0200, "Leonardo Fialho"
<lfialho_at_[hidden]> said:
> Ram,
>
> What is the name and version of the kernel module for your NIC? I have
> experimented some similar with my tg3 module. The error which appeared
> for my was different:
>
> [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv
> failed: No route to host (113)
>
> I solved it changing the following parameter in the linux kernel:
>
> /sbin/ethtool -K eth0 tso off
>
> Leonardo
>
>
> Aurélien Bouteiller escribió:
> > If you have several network cards in your system, it can sometime get
> > the endpoints confused. Especially if you don't have the same number
> > of cards or don't use the same subnet for all "eth0, eth1". You should
> > try to restrict Open MPI to use only one of the available networks by
> > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x
> > is the network interface that is always connected to the same logical
> > and physical network on your machine.
> >
> > Aurelien
> >
> > Le 1 oct. 08 à 11:47, V. Ram a écrit :
> >
> >> I wrote earlier about one of my users running a third-party Fortran code
> >> on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash
> >> behavior.
> >>
> >> Our cluster's nodes all have 2 single-core processors. If this code is
> >> run on 2 processors on 1 node, it runs seemingly fine. However, if the
> >> job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then
> >> it crashes and gives messages like:
> >>
> >> [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> >> [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> >> mca_btl_tcp_frag_recv: readv failed with errno=110
> >> mca_btl_tcp_frag_recv: readv failed with errno=104
> >>
> >> Essentially, if any network communication is involved, the job crashes
> >> in this form.
> >>
> >> I do have another user that runs his own MPI code on 10+ of these
> >> processors for days at a time without issue, so I don't think it's
> >> hardware.
> >>
> >> The original code also runs fine across many networked nodes if the
> >> architecture is x86-64 (also running OMPI 1.2.7).
> >>
> >> We have also tried different Fortran compilers (both PathScale and
> >> gfortran) and keep getting these crashes.
> >>
> >> Are there any suggestions on how to figure out if it's a problem with
> >> the code or the OMPI installation/software on the system? We have tried
> >> "--debug-daemons" with no new/interesting information being revealed.
> >> Is there a way to trap segfault messages or more detailed MPI
> >> transaction information or anything else that could help diagnose this?
> >>
> >> Thanks.
> >> --
> >> V. Ram
> >> v_r_959_at_[hidden]
> >>
> >> --
> >> http://www.fastmail.fm - Same, same, but different...
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Leonardo Fialho
> Computer Architecture and Operating Systems Department - CAOS
> Universidad Autonoma de Barcelona - UAB
> ETSE, Edifcio Q, QC/3088
> http://www.caos.uab.es
> Phone: +34-93-581-2888
> Fax: +34-93-581-2478
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
  V. Ram
  v_r_959_at_[hidden]
-- 
http://www.fastmail.fm - Faster than the air-speed velocity of an
                          unladen european swallow