Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Crash in code using OMPI 1.2.7 - Debugging assistance sought
From: V. Ram (v_r_959_at_[hidden])
Date: 2008-09-24 16:03:27


Hello.

I have a user running a Fortran code that can be built and run on on
both 32-bit and 64-bit architectures. When this code is built for the
x86-64 machines in our cluster, running on OMPI 1.2.7, it runs fine.
However, if we build and run it on 32-bit x86 machines, also running the
same GNU/Linux distribution and also with OMPI 1.2.7, it crashes with
errors like:

[node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
[node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed with errno=110
mca_btl_tcp_frag_recv: readv failed with errno=104

We have tried different Fortran compilers (both PathScale and gfortran)
and keep getting these crashes, which occur after varying numbers of
iterations. Running on a single node using MPI seems to work OK.

Are there any suggestions on how to figure out if it's a problem with
the code or the OMPI installation/software on the system? We have tried
"--debug-daemons" with no new/interesting information being revealed.
Is there a way to trap segfault messages or more detailed MPI
transaction information or anything else that could help diagnose this?

Thanks.

-- 
  V. Ram
  v_r_959_at_[hidden]
-- 
http://www.fastmail.fm - Access all of your messages and folders
                          wherever you are