I have a user running a Fortran code that can be built and run on on
both 32-bit and 64-bit architectures. When this code is built for the
x86-64 machines in our cluster, running on OMPI 1.2.7, it runs fine.
However, if we build and run it on 32-bit x86 machines, also running the
same GNU/Linux distribution and also with OMPI 1.2.7, it crashes with
mca_btl_tcp_frag_recv: readv failed with errno=110
mca_btl_tcp_frag_recv: readv failed with errno=104
We have tried different Fortran compilers (both PathScale and gfortran)
and keep getting these crashes, which occur after varying numbers of
iterations. Running on a single node using MPI seems to work OK.
Are there any suggestions on how to figure out if it's a problem with
the code or the OMPI installation/software on the system? We have tried
"--debug-daemons" with no new/interesting information being revealed.
Is there a way to trap segfault messages or more detailed MPI
transaction information or anything else that could help diagnose this?
http://www.fastmail.fm - Access all of your messages and folders
wherever you are